a blog by Marius Gedminas

Looking for an IRC bot

I'm looking for an IRC bot, mostly to provide logs of IRC conversations. I want and extensible one that is written in Python, so that I can add extra features (e.g. announcements of subversion commits). I want one that is in mature, usable, and available in Debian.

apt-cache search python irc bot gives me two answers:

  • libsoap-lite-perl (wha..?)
  • supybot

Looks like I'll be investigating supybot. It appears to have a lot of features that I'm not interested in. Some are interesting.

Profiling/tracing a single function

Sometimes you want to profile just a single function in your Python program. Here's a module that lets you do just that: profilehooks.py. Sample usage:

#!/usr/bin/python
from profilehooks import profile

class SampleClass:

    def silly_fibonacci_example(self, n):
        """Return the n-th Fibonacci number.

        This is a method rather rather than a function just to illustrate that
        you can use the 'profile' decorator on methods as well as global
        functions.

        Needless to say, this is a contrived example.
        """
        if n < 1:
            raise ValueError('n must be >= 1, got %s' % n)
        if n in (1, 2):
            return 1
        else:
            return (self.silly_fibonacci_example(n - 1) +
                    self.silly_fibonacci_example(n - 2))
    silly_fibonacci_example = profile(silly_fibonacci_example)


if __name__ == '__main__':
    fib = SampleClass().silly_fibonacci_example
    print fib(10)

(If you have Python 2.4, you can use @profile as a decorator just before the function definition instead of rebinding silly_fibonacci_example.)

Demonstration:

mg: ~$ python sample.py
55

*** PROFILER RESULTS ***
silly_fibonacci_example (sample.py:6)
function called 109 times

         325 function calls (5 primitive calls) in 0.004 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    108/2    0.001    0.000    0.004    0.002 profilehooks.py:79(<lambda>)
    108/2    0.001    0.000    0.004    0.002 profilehooks.py:131(__call__)
    109/1    0.001    0.000    0.004    0.004 sample.py:6(silly_fibonacci_example)
        0    0.000             0.000          profile:0(profiler)

This decorator is useful when you do not want the profiler output to include time spent waiting for user input in interactive programs, or time spent waiting for requests in a network server.

In a similair vein you can produce code coverage reports for a function.

#!/usr/bin/python
import doctest
from profilehooks import coverage

def silly_factorial_example(n):
    """Return the factorial of n."""
    if n < 1:
        raise ValueError('n must be >= 1, got %s' % n)
    if n == 1:
        return 1
    else:
        return silly_factorial_example(n - 1) * n
silly_factorial_example = coverage(silly_factorial_example)


if __name__ == '__main__':
    print silly_factorial_example(1)

Demonstration:

mg: ~$ python sample2.py
1

*** COVERAGE RESULTS ***
silly_factorial_example (sample2.py:5)
function called 1 times

       def silly_factorial_example(n):
           """Return the factorial of n."""
    1:     if n < 1:
>>>>>>         raise ValueError('n must be >= 1, got %s' % n)
    1:     if n == 1:
    1:         return 1
           else:
>>>>>>         return silly_factorial_example(n - 1) * n

2 lines were not executed.

I found it useful to discover whether a given function or a method was adequately covered by unit tests.

Update: profilehooks is now a proper easy_install'able Python package.

Diffing dicts

Say you are comparing two large dicts in a unit test, for example:

    form = extract_form(rendered_html)
    self.assertEquals(form, {'field1': u'value1',
                             'field2': u'value2',
                             ...
                             'field42': u'value42'})

When this test fails, a useful trick is to ask the test runner to drop into Pdb inside assertEquals (SchoolTool and Zope 3 test runners have a command line option -d for this) and type the following:

(Pdb) from sets import Set
(Pdb) pp list(Set(first.items()) ^ Set(second.items()))

You will get a list of (key, value) pairs that differ:

[('field.comp.c3.b.NEW', u''),
 ('field.comp.c1.b.NEW', u''),
 ('field.comp.c1.title', 'New stuff'),
 ('field.comp.c1.b.b2', u'A2'),
 ('field.comp.c3.b.b1', u'New behaviour'),
 ('field.comp.c3.title', u'New stuff'),
 ('SUBMIT', u'Save'),
 ('field.comp.c1.title', u'Comp 1'),
 ('field.comp.c2.b.b1', 'B1'),
 ('field.comp.c3.description', u'New description'),
 ('field.comp.c2.description', 'Comp two'),
 ('field.comp.c1.b.b1', u'A1'),
 ('field.comp.c2.title', 'Comp 2'),
 ('SUBMIT', 'Submit'),
 ('field.comp.c2.b.b2', 'B2'),
 ('field.comp.c1.description', 'New description'),
 ('field.comp.c1.description', u'Comp one'),
 ('field.comp.c1.b.b1', 'New behaviour')]

If there are many differences, sorting the list is a good idea

(Pdb) sorted = lambda l: (l.sort(), l)[1]
(Pdb) pp sorted(list(Set(first.items()) ^ Set(second.items())))

Python 2.4 makes this simpler (builtin set, builtin sorted).

Tools of the trade

During the PyPy sprint I've noticed that there are a lot of good development tools that people do not know about. There are also extremely convenient features of other tools that are also unknown. I think I should mention a few of them:

VIM
Vim is a very powerful text editor. It takes some getting used to, but that time is well worth the increased productivity you get as a result. My usual programming environment is a GVIM window with the source code and a terminal window for running unit tests.
Keyword completion in vim
If you want to write clear code, you have to use clear names that are sometimes on the longish side. Instead of typing the whole name again and again, you can just type the first three or four characters and hit Ctrl+P. If there are several names with the same prefix, keep hitting Ctrl+P until you get the one you want. If you overshoot, use Ctrl+N which looks for matching names in the other direction. Vim looks for names in the current file, then in all opened files, then in tags and finally in all included files. You can also complete file names, whole phrases etc. Type :help ins-completion for a full help.
Ctags
Run ctags -R in the root of the source tree of a project. It will build a tags database in a text file called "tags". The database contains all names (functions, classes, etc.) defined in the source code and locations of those definitions. Once in vim, type :tag somename and vim will jump to the definition of somename. Alternatively, move the cursor to a name and hit Ctrl+] to do the same. If the same name is defined in several places, you can use :tsel to see a list of them all and choose the one you want to go to. Tags are also useful for keyword completion described above.
GNU id-utils
While tags let you quickly find the definition of a name, id-utils let you find all the places where that name is used. Think "grep on steroids". Usage is similair: run mkid in the project root to build a name database (a file named "ID"), then use gid name anywhere in the project tree to list all filenames and line numbers where name is mentioned. In vim you can :set grepprg=gid and use :grep name to perform queries and get a list of results in the error window (:cw).

Update: GNU id-utils ignore Python source files by default (boo!). To get around that, copy the example id-lang.map from the id-utils distribution to your home directory, add *.py text at the end, and alias mkid to mkid -m ~/id-lang.map.

PyPy sprint in Vilnius

I spent the last week participating in a PyPy sprint. It was fun. I've learned a lot of obscure Python tricks that I'll try to avoid in the future if I do not want my code to be obscure. I had the chance to see how graphviz and Pygame can be combined into a very easy and pleasant to use debugging tool for looking at intricate data structures. Best of all -- I met a number of fine Python programmers: Armin Rigo, Holger Krekel, Michael Hudson, Bob Ippolito, Christian Tismer, Laura Creighton, Jacob Hallen.

I'm glad that my intricate Machiavellian plot worked to perfection: I couldn't convince myself that I could afford the time and money to take a couple of weeks off and fly somewhere to participate in a PyPy sprint, so I dropped a few hints and in the end helped organize a sprint here in Vilnius1.

1 "Here in Vilnius" technically incorrect, since I'm now on a plane over Amsterdam, half way to London.