So I've got supybot running and logging to a text file (one file per day). My next goal was to produce nice, aesthetically pleasing IRC logs in HTML.

First I defeated the urge to write my own IRC log to HTML converter from scratch. Then I started googling and was surprised how hard it was to find anything. I went through the whole IRC section in freshmeat.net. I finally decided that irclog2html.pl by Jeff Waugh was the closest thing to my ideal, with only two deficiencies:

  1. It's written in Perl, so customization is difficult.
  2. It produces sloppy HTML4 output in ISO-8859-1, while I wanted XHTML/CSS in UTF-8.

The obvious next step was to port irclog2html.pl to Python, refactor it so that customizations (e.g. adding a new output style) are straightforward, and then improve it. I also wrote a test script that runs both irclog2html.pl and irclog2html.py and compares the output. Unfortunately, I did not have any log files to test, so there may be some remaining bugs or just differences.

That's how I spent the night from 1 AM to 9:30 AM. At the end I have irclog2html.py that has a couple of new output styles (xhtml and xhtmltable), some bug fixes, understands ISO 8601 timestamps (YYYY-MM-DDTHH:MM:SS, such as found in irc logs produced by supybot's ChannelLogger), and can produce navigation links (prev, next, index) if you specify them on the command line. You can see the end result here:

SchoolTool IRC logs

I also wrote a second script, logs2html.py. It finds all log files in a directory, compares their mtimes to mtimes of corresponding html files, and runs irclog2html.py for logs that have changed. It also produces an index page and passes the necessary command line options to irclog2html.py to create navigational links. This script now from cron runs every five minutes.

And here's the stylesheed used by both scripts: irclog.css.

Today I wrote an ugly hacky script to split XChat log files into daily IRC log files suitable as input for logs2html.py, so that I could import past IRC conversations. I'm not publishing it because it's very ugly.

Update: irclog2html now has a web page. You can find the latest version there.