Python Gotcha: When copying a list ends up biting you

I recently spent a couple of hours chasing down one single bug in my program that was wreaking havoc on the output. I have been working on a QR Code generator and one of the final steps before making the code is applying eight different types of masking to the list of pixels in order to see which one produces the code which will be most readable to a device. I could apply one mask without issues, but applying the second mask — through looping or by copy and paste of the commands — produced erroneous data. It turns out the problem is in how I was (or actually wasn’t) making a copy of the list.

Let’s start with a simple example list:

test_list = [
    ['first member', 1, 2, 3],
    ['second member', 4, 5, 6],
    ['third member']
    ]

What does this look like to you? To me it looks like a list of lists. It’s a convenient data structure that I use all the time. Let’s make a copy of it:

list_copy = test_list

Now, to see what’s actually going on we need to look at the id of each variable (I defined a small function to give us a nice output for this step):

def disp_id():
    print "test_list id: ",id(test_list)
    print "list_copy id: ",id(list_copy)

>>> disp_id()
test_list id: 139936602407640
list_copy id: 139936602407640

I didn’t make a copy of the list, I simply assigned a new variable to the same list object. Remember that, it’s going to come back in just a minute. There are a couple of different ways to make a new copy of a list. Here’s the one I use because I think it’s the most readable:

list_copy = list(test_list)

Now let’s look at the ids of each of the lists :

disp_id()
test_list id: 139936602407640
list_copy id: 139936410105400

The lists now have different id numbers which means they are actually different list objects.

Now here’s the gotcha:

What happens if I change some data in the first list and print out its contents as well as the second list’s contents?

test_list[0][0] = 'big trouble'

>>> test_list
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]
>>> list_copy
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]

This behavior is very hard to find if you don’t know what you’re looking for. It’s caused by the fact that the Python doesn’t see this as a list of list, it sees it as a list of list objects. When I copied the original list using the list() function Python made a new list for me, but populated it with the ids pointing to the objects inside (in this case each object is a list but you will have the same problem with your own objects). There are a couple of ways to solve this but the best is to use copy.deepcopy():

import copy

list_copy = copy.deepcopy(test_list)

>>> list_copy
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]
>>> disp_id()
test_list id: 139936602407640
list_copy id: 139936410105040

test_list[0] id: 139936410106048
list_copy[0] id: 139936410172664

Now you can see that the data in each list is the same, but the id pointing to the objects is different. You can safely change anything inside one list without affecting the data in the other.

essential