a blog by Marius Gedminas

EuroPython 2007

The EuroPython conference was in my home town, Vilnius, this year. The conference is over now, but the sprints will continue until Saturday.

If I had to pick the three most interesting (to me) talks, I'd choose

Sadly, none of them have the slides available for download yet.

The lightning talks were also very interesting and entertaining (have you seen the Grok trailer?).

Continued...

Technorati tag:

Custom traversal in Zope 3

Certain things are not quite obvious in Zope 3. Custom traversal is one of those: I always have to go and look at an example when I need it. Here's the example:

Say, you have a content object that provides IMySite and is exposed to the web at /mysite. You want to implement custom traversal for names under it, e.g. have /mysite/mycalendar return some object specific to the user that's currently logged in.

You need to provide an IBrowserPublisher adapter for (IMySite, IBrowserRequest):

from zope.component import adapts, queryMultiAdapter
from zope.interface import implements
from zope.publisher.interfaces import NotFound
from zope.publisher.interfaces.browser import IBrowserRequest, IBrowserPublisher

from mypackage.interfaces import IMySite


class MySiteTraverser(object):
    """Browser traverser for IMySite."""

    adapts(IMySite, IBrowserRequest)
    implements(IBrowserPublisher)

    def __init__(self, context, request):
        self.context = context
        self.request = request

    def browserDefault(self, request):
        """Return the default view of /mysite."""
        # XXX: use getDefaultViewName instead of assuming it's index.html
        return self.context, ('index.html', )

    def publishTraverse(self, request, name):
        """Traverse to /mysite/$name."""
        if name == 'mycalendar':
            mycalendar = ... # TODO: do something to get the appropriate object
            return mycalendar

        # if self.context is a container of some sort,
        # you'll have to add traversal to items here manually.

        # fall back to views
        view = queryMultiAdapter((self.context, request), name=name)
        if view is not None:
            return view

        # give up and return a 404 Not Found error page
        raise NotFound(self.context, name, request)

Now register it in ZCML with

    <view
        for="mypackage.interfaces.IMySite"
        type="zope.publisher.interfaces.browser.IBrowserRequest"
        provides="zope.publisher.interfaces.browser.IBrowserPublisher"
        factory="mypackage.mymodule.MySiteTraverser"
        permission="zope.Public"
        />

Note that this is the regular view directive, not browser:view.

Update: Philipp von Weitershausen shows how Grok simplifies this (site disappeared; here's an Internet Archive link). Check out the Grok website.

Update: Added the missing __init__ method, thanks to Yuan Hong for noticing.

Upgrade to Feisty Fawn

Last Saturday I upgraded my laptop from Ubuntu Edgy (6.10) to Feisty (7.04). The upgrade broke down a bit (like every other Ubuntu upgrade before it, but in a different way): I left the update manager running, and when I came back several hours later, it was gone. Apt in a terminal complained about the upgrade being interrupted and prompted me to run dpkg --configure -a. I did that, answered a bunch of questions about which config files I wanted to override, rebooted, and that was it. I'm now running Feisty.

Good things after the upgrade:

  • Note pinning and bulleted lists in Tomboy
  • I can click on links in the channel topic in xchat-gnome again

Broken things after the upgrade:

  • bitlbee stopped working
  • ctrl-arrow keys stopped working in vim (and mutt) inside gnome-terminal (which now sends ESC O 1; 5 D where it used to send ESC O 5 D)
  • The laptop still crashes when resuming ever now and then

Benchmarking Zope 3 apps

Our Zope 3 application had a speed problem: a view used to export an XML file with user data stored in the ZODB took two thirds of an hour. Clearly, the time to optimize it has arrived.

I've been indoctrinated by reading various sources (Federico Mena-Quintero's blog is a good one) that you must do two things before you start fiddling with the code:

  1. Create a reproducible benchmark
  2. Profile

Step one: benchmark. A coworker convinced me to reuse the unit/functional test infrastructure. I've decided that benchmarks will be functional doctests named benchmark-*.txt. In order to not clutter (and slow down) the usual test run, I've changed the test collector to demote benchmarks to a lower level, so they're only run if you pass the --all (or --level 10) option to the test runner. Filtering out the regular tests to get only the benchmarks is also easy: test.py -f --all . benchmark.

Next, the measurement itself. Part of the doctest is used to prepare for the benchmark (e.g. create a 1000 random users in the system, with a fixed random seed to keep it repeatable). The benchmark itself is enclosed in function calls like this:

>>> benchmark.start('export view (1000 users)')
...
>>> benchmark.stop()

The result is appended to a text file (benchmarks.txt in the current directory) as a single line:

export view (1000 users), 45.95

where 45.95 is the number of CPU seconds (measured with time.clock) between start/stop calls.

Next, I experimented with various numbers of users (from 10 to 2000) and recorded the running times:

export view (10 users), 0.25
export view (10 users), 0.26
export view (100 users), 2.33
export view (200 users), 4.81
export view (200 users, cached), 4.47
export view (200 users), 4.91
...
export view (2000 users), 155.96
export view (2000 users, cached), 159.91

The "cached" entries are from the same doctest run, with the benchmark repeated a second time, to see whether ZODB caches have any effect (didn't expect any, as functional tests use an in-memory storage, and besides all the objects were created in the same thread).

Clearly, the run time grows nonlinearly. Some fiddling with Gnumeric (somewhat obstructed by Ubuntu bug 45341) showed a pretty clear N**2 curve. However it is not fun to copy and paste the numbers manually into a speeadsheet. A couple of hours playing with matplotlib and cursing broken package dependencies, and I have a 109 line Python script that plots the results and shows the same curve.

Step two: profiling. (Well, actually, I did the profiling bit first, but at least I refrained from changing the code until I got the benchmark. And it would have been easier to profile if I had the benchmark in place and didn't need to do it manually. For the purposes of this narrative pretend that I did the right thing and created a benchmark first.)

A long time ago I was frustrated by the non-Pythonicity of the profile module (my definition of a Pythonic API is API that I can use repeatedly without having to go reread the documentation every single time) and wrote a @profile function decorator. This came in handy:

class ExportView(BrowserView):

    template = ViewPageTemplateFile('templates/export.pt')

    @profile
    def __call__(self):
        return self.template()

One run and I see that 80% of the time is spent in a single function, SubjectVocabulary (users are called subjects internally, for historical reasons). It is registered as a vocabulary factory and iterates through all the users of the system, creating a SimpleTerm for each and stuffing all of those into a SimpleVocabulary. It was very simple and worked quite well until now. Well, now it's being called once per user (to convert the value of a Choice schema field into a user name), which results in O(N2) behavior of the export view.

A few minutes of coding, a set of new benchmarks, and here's the result:

Benchmark time graphs