Random notes from mg

a blog by Marius Gedminas

Marius is a Python hacker. He works for Programmers of Vilnius, a small Python/Zope 3 startup. He has a personal home page at http://gedmin.as. His email is marius@gedmin.as. He does not like spam, but is not afraid of it.

Thu, 12 Jun 2008

Hunting memory leaks in Python

At work the functional test suite of our application used up quite a lot of RAM (over 500 megs). For a long time it was cheaper to buy the developers an extra gig of RAM than to spend time hunting down a possible memory leak, but finally curiosity overcame me and I started investigating.

Warning: long post ahead. With pictures.

Running a subset of the tests in a loop quickly proved that the memory leak is real:

Graph of memory usage versus time for the same test repeated 5 times

The graph was produced by instrumenting the test runner to record the timestamp, memory usage (VmSize from /proc/$pid/status) and the number of objects being tracked by the garbage collector (len(gc.get_objects()) in a CSV file, and then writing a simple Python program to plot it with matplotlib.

I love matplotlib for the ease of use, even though sometimes I wish the docs were a bit nicer.

But wait!, I hear you say, Python is a garbage-collected language! How can it leak memory?

I'm glad you asked. The trouble is that sometimes an object created by the test is referenced from a global variable, and that keeps it from being collected. The tricky thing is to find where that reference comes from, and what is the object being referenced. There are 800 thousand live objects, how do you find the offending ones?

It took quite a while to think of a solution. Finally my coworker Ignas suggested drawing object graphs with graphviz, and I developed a module with a few convenient helper functions.

I put a breakpoint at the very end of the app and started looking around. Here's the number of in-memory object databases:

(Pdb) checks.count('DB')
6

There shouldn't be any, or there should be at most one (a global in-memory RAM database used for tracking browser sessions or something like that)! Let's see what objects are pointing to the last one, limiting the referencing chains to 15 objects:

(Pdb) checks.show_backrefs(checks.by_type('DB')[-1])
Graph written to objects.dot (185 nodes)
Image generated as objects.png

The image produces is nice, but large (9760 x 8008 pixels), so I'm not going to show it here in full. Here's a shrunken version:

Object referece graph showing the memory leak

By the way, GIMP eats up a gig of RAM with it open.

If you could zoom in and pan around, and if you knew the colour code, you'd immediatelly notice the green box indicating a module in the top-right corner:

Part of the object referece graph showing the source of the leak

Let me show you just the reference chain:

(Pdb) import inspect
(Pdb) chain = checks.find_backref_chain(checks.by_type('DB')[-1], inspect.ismodule)
(Pdb) in_chain = lambda x, ids=set(map(id, chain)): id(x) in ids
(Pdb) checks.show_backrefs(chain[-1], len(chain), filter=in_chain)
Graph written to objects.dot (15 nodes)
Image generated as objects.png

Chain of objects that keep the DB in memory

To get rid of this leak I had to clear zope.app.error.error._temp_logs in the test tear-down by calling zope.app.error.error._clear():

Graph of memory usage versus time for the same test repeated 5 times

I attribute the slight memory increase on the second repetition to memory fragmentation: new objects are allocated, old objects are freed, the total number of objects stays the same, but now there are some gaps in the memory arena. This effect disappears on the third and later repetitions, so I'm not worrying.

There were a couple of other, smaller memory leaks elsewhere. At the end of the day the full test suite fit in under 200 megs of RAM.

Don't let anyone tell you that graph theory is useless in the real world. Also, Python's garbage collector's introspection powers are awesome!

Update: see the annotated source code for the 'checks' module.

Update 2: the 'checks' module was open-sourced as objgraph.

posted at 01:04 | tags: | permanent link to this entry | 4 comments
Hi Marius,

thanks for the great post and useful tool. I've played around with it in order to find a memory leak that's been stalling my project, but so far I couldn't track it down yet (although I've narrowed it down).

For starters, I would like to ask a question about the example you posted in the docs of objgraph:

"def computate_something(_cache={}):
...  _cache[42] = dict(foo=MyBigFatObject(),
...  bar=MyBigFatObject())
...  # a very explicit and easy-to-find "leak" but oh well"

Could you explain real quick how this piece of code leads to a memory leak in python? It might be an obvious thing, but I would have expected the GC would detect something like this.

Also, I have I question regarding references from "frames". I'm not entirely sure how to interpret the back-reference list that I've generated, as it contains quite a few "frames" along the way back to a module:

http://dl.dropbox.com/u/7646876/error1.png

Maybe you could shed some light on this, too.

Thanks!
posted by chris at Mon Jan 10 15:05:22 2011
It's not a real leak (which is why I put "leak" in quotes), since the object is still reachable: computate_something.func_defaults[0] is the _cache dictionary, and it holds a reference to two MyBigFatObjects.

Frames are stack frames; their appearance means that some functions have local variables that directly or indirectly refer to your object.

I see two references to your Options object: one is a global variable obj, defined in the main script, and another is a local variable eval_data['ptr'], holding a huge list (320 thousand items, wow!), which is kept in memory because at some point an exception happened, and the traceback was stashed into sys.exc_traceback (an internal variable that you shouldn't normally access, using sys.exc_info() instead).

Even if the exception was caught and handled successfully, the sys module always holds a reference to the last exception and its traceback -- and with the traceback, all the local variables in all the functions.  There are ways to make sure exceptions don't lead to excessive memory usage: del local variables at the end of a function (if those variables hold onto big chunks of memory), or clear sys.exc_traceback by, e.g., raising and immediately catching a new exception after you've done with the computation.
posted by Marius Gedminas at Mon Jan 10 19:36:28 2011
Hi thanks for the tool. I just blogged about my experience with it too.

http://www.stuartmitchell.com/journal/2011/8/5/finding-python-memory-leaks-with-objgraph.html
posted by Stuart Mitchell at Fri Aug 5 05:19:42 2011
Great tool indead, you saved me !

I had numpy-subclass arrays that did not delete because they reference some slice views of them-selves. And I could not find that by my-self in thousands lines of old code...

By the way, it would be nice to have some idea of how much memory space is spend (e.g. per class-type). Usually, one only start thinking of memory leak if too much memory is taken. I found that:
http://code.activestate.com/recipes/546530/
but don't know if it can be of any use.

Thanks again, a lot !
posted by julien at Mon Jan 28 16:07:28 2013

Name (required)


E-mail (will not be shown)


URL


Comment (some HTML allowed)