Random notes from mg

a blog by Marius Gedminas

Marius is a Python hacker. He works for Programmers of Vilnius, a small Python/Zope 3 startup. He has a personal home page at http://gedmin.as. His email is marius@gedmin.as. He does not like spam, but is not afraid of it.

Sat, 07 Aug 2010

Profiling with Dozer

Dozer is mostly known for its memory profiling capabilities, but the as-yet unreleased version has more. I've talked about log capturing, now it's time for

Profiling

This WSGI middleware profiles every request with the cProfile module. To see the profiles, visit a hidden URL /_profiler/showall:

List of profiles

What you see here is heavily tweaked in my fork branch of Dozer; upstream version had no Cost column and didn't vary the background of Time by age (that last bit helps me see clumps of requests).

Here's what an individual profile looks like:

One profile

The call tree nodes can be expanded and collapsed by clicking on the function name. There's a hardcoded limit of 20 nesting levels (upstream had a limit of 15), sadly that appears not to be enough for practical purposes, especially if you start profiling Zope 3 applications...

You can also take a look at the WSGI environment:

WSGI environment expanded

Sadly, nothing about the response is captured by Dozer. I'd've liked to show the Content-Type and perhaps Content-Length in the profile list.

The incantation in development.ini is

[filter-app:profile]
use = egg:Dozer#profile
profile_path = /tmp/profiles
next = main

Create an empty directory /tmp/profiles and make sure other users cannot write to it. Dozer stores captured profiles as Python pickles, which are insecure and allow arbitrary command execution.

To enable the profiler, run paster like this:

$ paster serve development.ini -n profile

Bonus feature: call graphs

Dozer also writes a call graph in Graphviz "dot" format in the profile directory. Here's the graph corresponding to the profile you saw earlier, as displayed by the excellent XDot:

Call graph

See the fork where the "hot" red path splits into two?

Call graph, zoomed in

On the left we have Routes deciding to spend 120 ms (70% total time) recompiling its route maps. On the right we have the actual request dispatch. The actual controller action is called a bit further down:

Call graph, zoomed in

Here it is, highlighted. 42 ms (24% total time), almost all of which is spent in SQLAlchemy, loading the model object (a 2515 byte image stored as a blob) from SQLite.

A mystery: pickle errors

When I first tried to play with the Dozer profiler, I was attacked by innumerable exceptions. Some of those were due to a lack of configuration (profile_path) or invalid configuration (directory not existing), or not knowing the right URL (going to /_profiler raised TypeError). I tried to make Dozer's profiler more forgiving or at least produce clearer error messages in my fork branch, e.g. going to /_profiler now displays the profile list.

However some errors were very mysterious: some pickles, written by Dozer itself, could not be unpickled. I added a try/except that put those at the end of the list, so you can see and delete them.

Pickle errors

Does anybody have any clues as to why profile.py might be writing out broken pickles?

Update: as Ben says in the comments, my changes have been accepted upstream. Yay!

posted at 06:06 | tags: | permanent link to this entry | 5 comments
Incidentally, I'm rather unhappy with Python profilers giving me the method name (__call__) but not the class name (did you see how many different __call__s there were?).  There's filename + line number (shown in a tooltip on hover), so you can figure that out yourself, but it's tedious.
posted by Marius Gedminas at Sat Aug 7 17:06:49 2010
I've pulled your changes upstream, the Routes recompiling is perhaps due to a setting where it re-scans the controllers every request during debug mode on Pylons apps. This can be turned off in the routing.py file.

Nice to see the improvements!
posted by Ben Bangert at Sat Aug 7 20:50:21 2010
Nice post
posted by Darius Damalakas at Mon Aug 9 12:35:27 2010
As you requested, I continue comments here, not on a Facebook (although why you post to FB then...). Anyway.

Yes, it is big step forward, since remembering traditional profiler. But still there must be a tons of work done to make it useful. I don't know why, but in 99% cases I don't need profiling for my applications just because I am like profiling them "on the fly" in my head, so I can mostly always say where it lags and why (and Profiler usually confirms that). However, Profiler is very useful if you can attach it to a real application running and test it on a real load — here we often do miscalculations, forgetting Big O (or just taking shortcuts in most cases). :)

As of graph, it looks really sexy and nice (at first glance), however (at second glance) I see nearly a little value of it, precisely no more value as just to see a whole hierarchy of... well, that's the question of what — no classes are visible either. Maybe I don't know something, but "get 42.32ms" tells me no more than somewhere some object is trying to pull out some other object out of some sort of collection and did that after 42.32ms. How this is helpful for you, if all important info I am looking for is just totally missing? I find it kinda useless, especially on projects like Plone, where bazillion slow classes are loaded to the expensive DDR3 just show some "hello world"... But again, I never tried Dozer, just reading your post, so I might be wrong.

Honestly, lately (maybe a few years) I don't use Python much for appdev (unless it is Jython or various Unix infrastructure works), but first thing as appdev that I am truly missing in a Python world to profile various bigger than few classes things is something like this: https://visualvm.dev.java.net/profiler.html — go ahead, take a look at that just out of curiosity.

Hence my suggestion here to Dozer stuff would be to tell what actually class is called in a graph. But then I have no idea how to preserve it look nice. Maybe it is better not to have this great looking graph (yes, it is really great looking, no sarcasm here), but instead just repeat the same layout as VisualVM has? For example, like you have a full class name, where it is located, how much does it take and what thing it is calling. That would make way more sense, I recon.

Also, maybe things in the first and last screenshots on a table better be sorted by load time (that beige bar in the middle)?

Well, I dunno, just a little few yen in a whole post. :-)
posted by BM at Wed Aug 11 15:24:30 2010
Loving the profiler.  I found coloring graph edges by call count on a logarithmic scale helpful for large graphs at times.
posted by yvl at Fri Oct 21 14:06:26 2011

Name (required)


E-mail (will not be shown)


URL


Comment (some HTML allowed)