{"id":1959,"date":"2019-09-19T18:58:07","date_gmt":"2019-09-20T02:58:07","guid":{"rendered":"http:\/\/wonghoi.humgar.com\/blog\/?p=1959"},"modified":"2025-12-01T04:04:17","modified_gmt":"2025-12-01T12:04:17","slug":"modifying-mutable-like-bytearray-arguments-data-in-python-functions","status":"publish","type":"post","link":"https:\/\/wonghoi.humgar.com\/blog\/2019\/09\/19\/modifying-mutable-like-bytearray-arguments-data-in-python-functions\/","title":{"rendered":"Modifying mutable (like bytearray) arguments&#8217; data in Python functions"},"content":{"rendered":"\n<p>I&#8217;d like to write a function to selectively modify lines read from a file handle and write it back. By default, lines are read as <code>byte()<\/code> objects that are immutable, so I converted it to <code>bytearray()<\/code> instead so it can be modified because only a few lines meeting certain criteria needs to be changed.<\/p>\n\n\n\n<p>When I try to refactor similar operation into a function, I was hoping to pass the mutable <code>bytearray()<\/code> as an argument and directly modify the caller&#8217;s content like in C++, given Python variables works <strong>LIKE<\/strong> reference binding.<\/p>\n\n\n\n<p>I know <code>bytearray.replace()<\/code> does not modify the data in place, but instead outputs the modified line to a new variable. Normally, I can simply do this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">line = line.replace(b'\\tCLASS',\u00a0b'')<\/pre>\n\n\n\n<p>and the code will work. However, it doesn&#8217;t do anything when I try to pass it as an argument to a Python function (unless I return <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">line<\/code>&nbsp;as output). Although I am well aware that Python variables assignments to existing variables means orphaning the old data and re-purposing the label, the variable assignment behavior in Python requires careful thought when used in non-idiomatic situation.<\/p>\n\n\n\n<p>In other words, I want this function to have side effects on the variable &#8216;<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">line<\/code>&#8216;, but I wasn&#8217;t doing it right. This is a tempting mistake for people with a C\/C++ background: <a href=\"http:\/\/www.cplusplus.com\/forum\/beginner\/32371\/\"><strong>in C\/C++<\/strong>, it is not possible to shadow an input parameter<\/a> even if we were to explicitly declare it, so the innocent assignment I did above has to modify the object in the caller (passed as a reference to the function) in C\/C++, as if I did this directly in the caller.<\/p>\n\n\n\n<p>However, in Python, variables do not need to be declared (aka, dynamically typed). This opens up the possibility of <strong>unwittingly shadowing the input parameters<\/strong>, which is what happened here. Mutable arguments on the stack still can be modified through the function, but when you assign a <strong>variable<\/strong> using &#8216;=&#8217; operator, a new <strong>local<\/strong> variable with the name on the LHS is created, which shadows the input parameter.<\/p>\n\n\n\n<p>This means the connection to the caller objects is lost during shadowing.<\/p>\n\n\n\n<p>The correct way to do this is use <a href=\"https:\/\/stackoverflow.com\/questions\/10623302\/how-assignment-works-with-python-list-slice\">slice assignment<\/a> (which the logic\/concept is very different despite the syntax is similar) to replace all the contents of the input variable with the output of <code>bytearray.replace()<\/code>:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def remove_from_header_token_CLASS(tokens, line):\n     # line is expected to be byte array (mutable)    \n    try:\n        column_CLASS = tokens.index(b'CLASS')\n    except:\n        column_CLASS = None\n    else:\n        line[:] = line.replace(b'\\tCLASS', b'')  \n                \n    return column_CLASS<\/pre>\n\n\n\n<p>Since Python has a clear distinct concept of <em>parameter<\/em> variable (from <em>local<\/em> variable), trying to apply\u00a0<em><code>nonlocal<\/code><\/em> keyword over it (in hopes to broaden the scope) will not parse\/compile.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>This is actually the same behavior as in MATLAB (dynamic typing) for the same reason that variables do not have to be declared like in C\/C++ (static typing). In MATLAB, if you choose to have a handle object (which works like references), you can shadow the input argument by creating a local variable of the same name:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"matlab\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">% DemoHandleClass.m\nclassdef DemoHandleClass &lt; handle\n    properties\n        x = 3;\n    end\nend\n\n% f_shadow.m\nfunction f_shadow(C)\n    C = DemoHandleClass();  % Shadowing\n    C.x = 14;\n    fprintf(\"x=%d in f_shadow()\\n\", C.x)\nend\n\n% f_no_shadow.m\nfunction f_no_shadow(C)\n    C.x = 14;\n    fprintf(\"x=%d in f_no_shadow()\\n\", C.x)\nend\n\n% demo_shadowing\nfunction demo_shadowing()\n    C = DemoHandleClass();\n    f_shadow(C);    \n    fprintf(\"C.x=%d after shadowing()\\n\", C.x)    \n\n    D = DemoHandleClass();\n    f_no_shadow(D);\n    fprintf(\"D.x=%d without shadowing()\\n\", D.x)    \nend\n% Running demo_shadowing() outputs:\n% x=14 in f_shadow()\n% C.x=3 after shadowing()\n% x=14 in f_no_shadow()\n% D.x=14 without shadowing()<\/pre>\n\n\n\n<p>The above MATLAB program will display 14 without shadowing and 3 with shadowing (C became a new local variable that has nothing to do with the input argument C). <\/p>\n\n\n\n<p>Modern MATLAB editor will indirectly warn you that input variable <code>C<\/code> in <code>f_shadow(C)<\/code> was not used despite the code looks like handle C was overwritten. <\/p>\n\n\n\n<p>For both Python and MATLAB (handle classes), an object is mutable in the sense that assigning to the existing variable name does not change the underlying content of the existing variable, but instead the existing variable is discarded (link to the input source severed) and the variable name was locally reused (shadowing).<\/p>\n\n\n\n<p>MATLAB users rarely run into this because the language design heavily discourage side-effects: we are supposed to return the changed local variable to the caller. The only way to do side-effects in MATLAB is through handles (which you need to establish a class, which is clumsy). Technically you can\u00a0write the data to external resources (e.g. file) and read it back. But guess what? Resources are accessed through handles, so there&#8217;s no escape.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Of course, there&#8217;s a <a href=\"https:\/\/docs.python.org\/3\/faq\/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference\">better way<\/a> to do so (MATLAB&#8217;s preferred way): return the modified object back to the caller as if they are immutable:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def remove_from_header_token_CLASS(tokens, line):\n     # line DOES NOT HAVE TO BE MUTABLE    \n    try:\n        column_CLASS = tokens.index(b'CLASS')\n    except:\n        column_CLASS = None\n    else:\n        line = line.replace(b'\\tCLASS', b'')  \n                \n    return column_CLASS, line<\/pre>\n\n\n\n<p>This is what I ultimately used (so I ended up not converting the <em>byte<\/em> lines to <em>bytearray<\/em>), given that Python&#8217;s tuple syntax make it easy to return multiple outputs like MATLAB. The call ended up looking like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">column_SPL_CLASS, line = remove_from_header_token_CLASS(tokens, line)                \n<\/pre>\n\n\n\n<p>Nonetheless, I think there&#8217;s an important lesson to be learned for doing side-effects in dynamically typed languages. Maybe I&#8217;ll need this one day if I get an excuse to do something more complicated that genuinely requires side-effects.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>In summary,&nbsp;<strong>variable<\/strong> assignments in most dynamically typed languages will shadow the input argument with a newly generated <strong>local<\/strong> variable instead of modifying the data in the original input argument. This implies that there <span style=\"text-decoration: underline;\">function side-effects cannot be carried out through <strong>variable<\/strong> assignment<\/span>.<\/p>\n\n\n\n<p>The most common implication is: do not (equality) assign to a input variable to modify its contents in a dynamically typed language.<\/p>\n<div class=\"pvc_clear\"><\/div><p id=\"pvc_stats_1959\" class=\"pvc_stats all  \" data-element-id=\"1959\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/wonghoi.humgar.com\/blog\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p><div class=\"pvc_clear\"><\/div>","protected":false},"excerpt":{"rendered":"<p>I&#8217;d like to write a function to selectively modify lines read from a file handle and write it back. By default, lines are read as byte() objects that are immutable, so I converted it to bytearray() instead so it can &hellip; <a href=\"https:\/\/wonghoi.humgar.com\/blog\/2019\/09\/19\/modifying-mutable-like-bytearray-arguments-data-in-python-functions\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_1959\" class=\"pvc_stats all  \" data-element-id=\"1959\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/wonghoi.humgar.com\/blog\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[6,34],"tags":[],"class_list":["post-1959","post","type-post","status-publish","format-standard","hentry","category-note-to-self","category-python"],"_links":{"self":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts\/1959","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/comments?post=1959"}],"version-history":[{"count":27,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts\/1959\/revisions"}],"predecessor-version":[{"id":6799,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts\/1959\/revisions\/6799"}],"wp:attachment":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/media?parent=1959"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/categories?post=1959"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/tags?post=1959"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}