Python Gotcha: When copying a list ends up biting you

I recently spent a couple of hours chasing down one single bug in my program that was wreaking havoc on the output. I have been working on a QR Code generator and one of the final steps before making the code is applying eight different types of masking to the list of pixels in order to see which one produces the code which will be most readable to a device. I could apply one mask without issues, but applying the second mask — through looping or by copy and paste of the commands — produced erroneous data. It turns out the problem is in how I was (or actually wasn’t) making a copy of the list.

Let’s start with a simple example list:


test_list = [
 ['first member', 1, 2, 3],
 ['second member', 4, 5, 6],
 ['third member']
 ]

What does this look like to you? To me it looks like a list of lists. It’s a convenient data structure that I use all the time. Let’s make a copy of it:


list_copy = test_list

Now, to see what’s actually going on we need to look at the id of each variable (I defined a small function to give us a nice output for this step):


def disp_id():
 print "test_list id: ",id(test_list)
 print "list_copy id: ",id(list_copy)

>>> disp_id()
test_list id: 139936602407640
list_copy id: 139936602407640

I didn’t make a copy of the list, I simply assigned a new variable to the same list object. Remember that, it’s going to come back in just a minute. There are a couple of different ways to make a new copy of a list. Here’s the one I use because I think it’s the most readable:


list_copy = list(test_list)

Now let’s look at the ids of each of the lists :


disp_id()
test_list id: 139936602407640
list_copy id: 139936410105400

The lists now have different id numbers which means they are actually different list objects.

Now here’s the gotcha:

What happens if I change some data in the first list and print out its contents as well as the second list’s contents?


test_list[0][0] = 'big trouble'

>>> test_list
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]
>>> list_copy
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]

This behavior is very hard to find if you don’t know what you’re looking for. It’s caused by the fact that the Python doesn’t see this as a list of list, it sees it as a list of list objects. When I copied the original list using the list() function Python made a new list for me, but populated it with the ids pointing to the objects inside (in this case each object is a list but you will have the same problem with your own objects). There are a couple of ways to solve this but the best is to use copy.deepcopy():


import copy

list_copy = copy.deepcopy(test_list)

>>> list_copy
[['big trouble', 1, 2, 3], ['second member', 4, 5, 6], ['third member']]
>>> disp_id()
test_list id: 139936602407640
list_copy id: 139936410105040

test_list[0] id: 139936410106048
list_copy[0] id: 139936410172664

Now you can see that the data in each list is the same, but the id pointing to the objects is different. You can safely change anything inside one list without affecting the data in the other.

Splitting VCF files using Python

I recently did a fresh install of Ubuntu 11.10. I forgot to export my contacts from Evolution and was horrified to learn these are not stored in a flat file and the database is not compatible between versions (great work Evolution devs).

Some poking around on the internet led me to a Perl file to that was able to get the data. Then some Python work let me format it correctly as a VCARD (.VCF) file. But when I tried to import it I didn’t get all my contacts. More sleuthing led me to realize that only the first 75 were being imported. I wrote this short Python script to break up my 190-contact VCARD file into parts that had no more than 75 entries. I hope it will help you out too!

Parallel port trigger tells me when I’m transcoding

I decided to finish up my Cylon Eye (Larson Scanner) project by adding it as a status indicator for my computer. I record over the air programming and transcode it to DVD quality. Since things can be a bit slower when FFmpeg is running, I set it up for the Eye to scan to let me know it’s processing a video file in the background. All the details after the break but here’s the gist of the system:

  • Python script started by the FFmpeg transcoding script
  • It controls the Parallel port, driving pin 1 high to turn on the cylon eye
  • the ‘pidof’ command is called every minute for FFmpeg. When it is not found, the cylon eye is turned off and the script exits

Continue reading

Conway’s Game of Life

I finally got around to programming Conway’s Game of Life. I’ve long wanted to give this a try but just today decided to take some time to myself and actually do it. I chose Python, a language I’ve worked with quite a bit but one I’ve never used for GUI programming. I spent the majority of my time trying to figure out how to display the Life grid, and decided to use a package called pygame, which I enjoy quite a bit!

About the code:

The game itself was actually pretty easy to code. I decided to make a multidimensional array as a lookup table. The first dimension is indexed by whether the current cell is dead (0) or alive (1). The second dimension is indexed by the sum of the living cells around the test cell. The return value is status of the test cell after the rules are applied for the next generation. The rest is just iterating through the various buffer arrays and then writing to the display.

The number of cells, cell size, gap between cells, delay between generations, and percent of live cells at genesis are all configurable. The game checks for stagnation at the end of evolution and will change the window title to show how many it took to reach equilibrium.

The pygame package turned out to be very easy to work with. I has an event handler that takes care of the delay time between generations. Take a look at it if you are ever working on a game!

I’d love to hear your thoughts about my code. Check it out and then leave a comment or send me a tweet!

Source Code Repository

Follow Me:

@szczys

What I’ve learned about using the Beautiful Soup Python module

I like to use Beautiful Soup in combination with urllib2 to parse HTML from Python scripts. The problem is that I spend like 30 minutes relearning how to use it every time I do a new project. So, for my own use (and maybe yours) here's my quick tips for syntax.

I always start off the same way, two lines of code to snag and objectify the HTML:

html = urllib2.urlopen("http://hackaday.com/comments/feed").read()
soup = BeautifulSoup(html)

From there it's a matter of working with the 'soup' data object. This one gets an RSS feed of comments. They have are partitioned into <item> tags which you can traverse like this:

soup('item')[0]

Which is an array with an index (this is item 0). But you can also iterate through the list using:

for item in soup('item'):

From there just walk through the tree hierarchy. Here's how you can get the publish date (string surrounded by <pubdate> tags) for the item. Notice that you need to index the pubdate in order to access its string data:

soup('item')[0]('pubdate')[0].string

The part that always confuses me is the need for the index. It identifies which tag you're accessing in case there are multiples in this part of the tree. You can get the number of tags found by wrapping your tag term in the length funtion:

len(soup('item'))

Should always return 15 because that's the number of comments WordPress is set to publish in the RSS feed.

There are other ways to do this using soup.findAll, but I find this one usually works the best.