Python Gotcha: When copying a list ends up biting you

I recently spent a couple of hours chasing down one single bug in my program that was wreaking havoc on the output. I have been working on a QR Code generator and one of the final steps before making the code is applying eight different types of masking to the list of pixels in order to see which one produces the code which will be most readable to a device. I could apply one mask without issues, but applying the second mask — through looping or by copy and paste of the commands — produced erroneous data. It turns out the problem is in how I was (or actually wasn’t) making a copy of the list.

Let’s start with a simple example list:

test_list = [
 ['first member', 1, 2, 3],
 ['second member', 4, 5, 6],
 ['third member']

What does this look like to you? To me it looks like a list of lists. It’s a convenient data structure that I use all the time. Let’s make a copy of it:

list_copy = test_list

Now, to see what’s actually going on we need to look at the id of each variable (I defined a small function to give us a nice output for this step):

def disp_id():
 print "test_list id: ",id(test_list)
 print "list_copy id: ",id(list_copy)

>>> disp_id()
test_list id: 139936602407640
list_copy id: 139936602407640

I didn’t make a copy of the list, I simply assigned a new variable to the same list object. Remember that, it’s going to come back in just a minute. There are a couple of different ways to make a new copy of a list. Here’s the one I use because I think it’s the most readable:

list_copy = list(test_list)

Now let’s look at the ids of each of the lists :

test_list id: 139936602407640
list_copy id: 139936410105400

The lists now have different id numbers which means they are actually different list objects.

Now here’s the gotcha:

What happens if I change some data in the first list and print out its contents as well as the second list’s contents?

test_list[0][0] = 'big trouble'

>>> test_list
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]
>>> list_copy
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]

This behavior is very hard to find if you don’t know what you’re looking for. It’s caused by the fact that the Python doesn’t see this as a list of list, it sees it as a list of list objects. When I copied the original list using the list() function Python made a new list for me, but populated it with the ids pointing to the objects inside (in this case each object is a list but you will have the same problem with your own objects). There are a couple of ways to solve this but the best is to use copy.deepcopy():

import copy

list_copy = copy.deepcopy(test_list)

>>> list_copy
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]
>>> disp_id()
test_list id: 139936602407640
list_copy id: 139936410105040

test_list[0] id: 139936410106048
list_copy[0] id: 139936410172664

Now you can see that the data in each list is the same, but the id pointing to the objects is different. You can safely change anything inside one list without affecting the data in the other.

Splitting VCF files using Python

I recently did a fresh install of Ubuntu 11.10. I forgot to export my contacts from Evolution and was horrified to learn these are not stored in a flat file and the database is not compatible between versions (great work Evolution devs).

Some poking around on the internet led me to a Perl file to that was able to get the data. Then some Python work let me format it correctly as a VCARD (.VCF) file. But when I tried to import it I didn’t get all my contacts. More sleuthing led me to realize that only the first 75 were being imported. I wrote this short Python script to break up my 190-contact VCARD file into parts that had no more than 75 entries. I hope it will help you out too!

[gist id=1478337]

Python parallel port control


Looking for a really easy way to control your project from a computer? If you have a parallel port which isn’t used you’re in luck. Python has a module that makes it easy to toggle the pins on the parallel port

First install the pyParallel module. It’s in the Ubuntu repositories:

sudo apt-get install python-parallel

To use the module just import it, instantiate an object, then write or read from that object.

import parallel
parPort = parallel.Parallel()

Now, this threw a permission error for me. But a bit of searching led me to find that you need to remove the lp module and insert the ppdev module:

sudo rmmod lp
sudo modprobe ppdev

This module will load again next time you reboot. Consider blacklisting it if you are using automated Python scripts that need parallel port access.

That’s it! Don’t you love Python? Of course there are some additional functions availalbe for this module so check the documentation to see what else can be done.