a blog by Marius Gedminas

Logging levels and hierarchies

I remember when the logging package seemed big and complicated and forbidding. And then I remember when I finally "got" it, started using it, even liked it. And then I've discovered that I didn't really understand the model after all.

Consider this: we have two loggers

   -- mylogger

configured as follows:

import logging
root = logging.getLogger()
mylogger = logging.getLogger('mylogger')

What happens when I do mylogger.debug('Hi')?

Answer: the message appears in both debug.log and info.log.

That was surprising to me. I'd always thought that a logger's level was a gatekeeper to all of that logger's handlers. In other words, I always thought that when a message was propagating from a logger to its parent (here from mylogger to root), it was also being filtered against the parent's log level. That turns out not to be the case.

What actually happens is that a message is tested against the level of the logger where it was initially logged, and if it passes the check, it gets passed to all the handlers of that logger and all its ancestors with no further checks. Unless the propagation is stopped somewhere by one of the loggers having propagate set to False. And of course each handler has its own level filtering. And I'm ignoring filters and the global level override.

Part of the confusion was caused by my misunderstanding of logging.NOTSET. I assumed, incorrectly, that it was just a regular logging level, one even less severe than DEBUG. So when I wrote code like this:

import logging
root = logging.getLogger()
mylogger = logging.getLogger('mylogger')
mylogger.setLevel(logging.NOTSET)  # <-- that's the default level, actually
mylogger.debug("debug message")

I saw the debug message being suppressed and assumed it was because of the root logger's level. Which is correct, in a way, just not the way I thought about it.

NOTSET does not mean "pass all messages through", it means "inherit the log level from the parent logger". The documentation actually describes this, although in a rather convoluted way. My own fault for misunderstanding it, I guess.

trace.py and os.fork()

Here's a simple Python script:

import subprocess
with open('hi.txt', 'w') as f:
  subprocess.Popen(['echo', 'Hello'], stdout=f).wait()

Here's what it does:

$ python2.7 hi.py
$ cat hi.txt

And here's what happens if you try to use the trace module to trace the execution:

$ python2.7 -m trace --trace hi.py
(lots of trace output omitted)

So far so good, but take a look at hi.txt:

$ cat hi.txt
subprocess.py(1170):                             _dup2(errwrite, 2)
 --- modulename: subprocess, funcname: _dup2
subprocess.py(1164):                                 if a == b:
subprocess.py(1166):                                 elif a is not None:
... 154 lines of unexpected trace output omitted ...
os.py(379):         try:
os.py(380):             func(fullname, *argrest)

The tracing hook installed by trace.py survives the os.fork() and interferes with subprocess output redirection. Now you know.

Imagine what this does to scripts that execute shell commands and try to parse their output.

Who wants to file a bug and/or provide a patch? I'm afraid I don't have enough round tuits...

logging.config.fileConfig gotcha

If you use logging.config.fileConfig (e.g. because you use paster serve something.ini to deploy your WSGI apps) you should know about this.

By default fileConfig disables all pre-existing loggers if they (or their parent loggers) are not explicitly mentioned in your .ini file.

This can result in unintuitive behaviour:

(if you don't see the embedded example, you can find it at https://gist.github.com/1642893).

If you have Python 2.6 or later (and you should), you can turn this off by passing disable_existing_loggers=False to fileConfig(). But what if it's not you calling fileConfig() but your framework (e.g. the above-mentioned paster serve)?

Now usually paster serve tries to configure logging before importing any of your application modules, so there should be no pre-existing loggers to disable. Sometimes, however, this doesn't work for one reason or another, and you end up with your production server suppressing warnings and errors that should not be suppressed. I haven't actually figured out yet who's responsible for those early imports in the application I'm working on (until today I assumed, incorrectly, that paster imports the module containing your WSGI app before it calls fileConfig).

If you're not sure if this bug can bite you or not, check that you don't have any disabled loggers by doing something like

import logging
assert not any(getattr(logger, 'disabled', False)
               for logger in logging.getLogger().manager.loggerDict.values())

while your application is running.

Converting a gnarly SVN repository to GIT: success!

I've received more feedback about my last night's post on gnarly svn to git migration than I've expected. Thanks to that feedback (and, mostly, Raffaele Salmaso for doing almost all the work and emailing the result to me) eazysvn is now on GitHub.

The rest of this post will describe the conversion (and verification) in detail, because if I ever need to do this again, I do not want to start from scratch.

Converting a gnarly SVN repository to GIT: FAIL.

eazysvn lives in a Subversion repository. I want to bring it (kicking and screaming) into the 21st century and put it on Github.

git-svn is unsuitable for the conversion, because in revision 50 I moved / to /trunk and added the traditional /tags and /branches. With git-svn I either get one third of the history that ignores everything before the layout switch, or I get directories named 'trunk', 'tags' and 'branches'.

Then I thought maybe hg would be smarter about the conversion, and then I could use hg-fast-export to convert hg to git. I enabled the hgsubversion extension:

$ sudo apt-get install hgsubversion
$ echo '[extensions'] >> ~/.hgrc
$ echo 'hgsubversion =' >> ~/.hgrc

(blog posts like this are a good reason why hg ought to steal the 'git config --global foo.bar=baz' syntax from git).

Then I converted the svn repository to Mercurial:

$ hg clone svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-hg

hg log -p confirmed that hg handled the conversion nicely looked right-ish at first glance, except the author information was nonsensical. To fix that:

$ rm -rf eazysvn-hg
$ echo 'mg = Marius Gedminas <marius@gedmin.as>' > AUTHORS
$ hg clone svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-hg -A AUTHORS

Unfortunately, a closer look at hg log now shows that two thirds of the history is lost: hg ignored everything before the layout restructuring, despite printing the log messages of those revisions as it went about the conversion. *sigh*.

Dear lazyweb, surely svn layout reorganization can't be such a rare thing that no tools in existence support it? What should I try next?

P.S. I also tried Bazaar, to see what it would do:

$ bzr branch svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-bzr
Repository with UUID 4fc293c4-4eed-0310-a01a-b4ad72f90fad at svn+ssh://fridge/home/mg/svn/eazysvn contains fewer revisions than cache. This either means that this repository contains an out of date mirror of another repository (harmless), or that the UUID is being used for two different Subversion repositories (potential repository corruption).
bzr: ERROR: exceptions.KeyError: 'missing revision paths for 78'

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 946, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 1150, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 699, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 721, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 135, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/builtins.py", line 1263, in run
  File "/usr/lib/python2.7/dist-packages/bzrlib/bzrdir.py", line 919, in open_tree_or_branch
    return bzrdir._get_tree_branch()
  File "/usr/lib/python2.7/dist-packages/bzrlib/controldir.py", line 410, in _get_tree_branch
    branch = self.open_branch(name=name)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/remote.py", line 420, in open_branch
    branch_path = self._determine_relpath(name)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/remote.py", line 369, in _determine_relpath
    layout = repos.get_layout()
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/repository.py", line 701, in get_layout
    return self.get_layout_source()[0]
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/repository.py", line 720, in get_layout_source
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/repository.py", line 743, in _find_guessed_layout
    revnum, self._hinted_branch_path)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/layout/guess.py", line 143, in repository_guess_layout
    return logwalker_guess_layout(repository._log, revnum, branch_path=branch_path)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/layout/guess.py", line 149, in logwalker_guess_layout
    logwalker.iter_changes(None, revnum, max(0, revnum-GUESS_SAMPLE_SIZE)), revnum, branch_path)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/layout/guess.py", line 104, in guess_layout_from_history
    for (revpaths, revnum, revprops) in changed_paths:
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/logwalker.py", line 60, in iter_all_changes
    revpaths = get_revision_paths(revnum)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/logwalker.py", line 295, in get_revision_paths
    return self.cache.get_revision_paths(revnum)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/cache/tdbcache.py", line 187, in get_revision_paths
    raise KeyError("missing revision paths for %d" % revnum)
KeyError: 'missing revision paths for 78'

You can report this problem to Bazaar's developers by running
    apport-bug /var/crash/bzr.1000.2012-02-19T22:12.crash
if a bug-reporting window does not automatically appear.