a blog by Marius Gedminas

Subversion troubles

I keep my home directory in a Subversion repository. A month ago that repository broke down. Subversion developers were unable to help me without access to the repository, which I didn't want to grant, since my repository contains some private data (Jabber account passwords and the like).

I spent a day debugging the problem until I found the cause (a loop in a data structure). Then I was too busy to do anything, and a month went by without regular backups. Two days ago I finally hacked a workaround and managed to extract a nearly complete repository dump. All is well again, except for my confidence in Subversion, which is a bit shaken.

All of my Subversion repositories now use the fsfs format. I only had troubles with the bdb format so far.

Putting ~ into Subversion

Backups: everyone knows they're important, but no one does them until they lose data (or barely escape losing data, if they're lucky).

My old laptop's old hard disk almost died once, and that made me think about backing things up. I read an inspiring article by Joey Hess, and decided to keep most of my home directory (all the numerous small files) in a Subversion repository on a remote server. It worked quite well for almost a year. I changed the hard disk, got a new laptop, and switched Linux distributions (Debian to Ubuntu), and every time all I had to do was install Subversion and check out my home directory.

I can also check out various subdirectories (such as ~/bin, ~/.mutt, ~/.vim) on other machines, and keep useful scripts and configuration files synchronized.

I have a shell script 'autocommit' that runs svn add and svn commit for a few common places (Tomboy notes, Firefox profile, IRC logs). I review changes to other and commit them manually. I have included a large number of automatically generated junk files in svn:ignore properties (~/.gconf falls in this category, since the diffs are numerous and useless).

Some subdirectories are not versioned (~/img, ~/mp3, ~/src, ~/Mail). I back up large amounts of data (such as my photo collection) with rsync. I keep my source trees in separate repositories. I keep all my mail on a IMAP server, and use offlineimap to synchronize it with a bunch of Maildirs on the laptop.

I do not keep my private keys (GPG and SSH) in the repository.

Here's how I do backups:

$ backup-to-musmire

Rsync ~/img, ~/mp3 etc. to my home file server.

$ offlineimap

Synchronize all my mail folders.

$ autocommit

Commit periodically changing files to Subversion.

(The three commands above can be run in parallel)

$ svn st

See if there are any uncommitted changes. If there are, I'll commit them separately with meaningful log messages. If there are unknown files, I'll either svn add and svn commit them, or I'll add them to svn:ignore.

Open WiFi

A young guy just walked into our office. I initially thought he was selling someone, but that turned out to be wrong. The guy said that he was using our open WiFi, and asked if we minded. We looked at each other, shrugged, said no, as long as he didn't use too much bandwidth and didn't do bad things like sending spam (we pay a flat rate, and use ssh/ssl for everything).

Then the guy asked for some technical help -- in particular, he asked for an outgoing SMTP server that he could use. He mentioned that he lives in a building across the courtyard and that the connection was a bit weak, and did we mind if he set up a repeater in his flat?

It turned out he was a designer, studying cinematography.

It was a little bit weird.

Beagle on Breezy

I upgraded to Breezy yesterday morning (1 hour dist-upgrade, 1 hour patching and compiling a kernel to fix radeon power drain bug, 2 hours figuring out why Firefox won't start).

I decided to try out Beagle.

sudo apt-get install beagle
beagled
best&

This is all it takes to install it, run the indexing daemon, and run the search utility that sits in a tray icon. (I had enabled user_xattr in my /etc/fstab a while ago.)

I was impressed by the background indexer -- it really is extremely nonintrusive. No slowdowns, no excessive disk I/O. I couldn't notice it was running in the background. I left the laptop running overnight and went to sleep.

In the morning I discovered that mono-beagle daemon ate 300 megs of virtual memory (all that was left, and then some -- the laptop started swapping). What is worse, it ate all the remaining disk space (I had about a gig left). ~/.beagle/ eats 500 megs -- the rest are probably metadata in extended attributes, scattered all over the place.

How do I measure the disk space taken by extended attributes? How do I strip them?

How do I discover how complete the index is? I presume beagled knows which files it has already indexed, and which it still plans to index in the future?

What are the RAM and disk costs of indexing 20 gigs of data (that's my ~ for you)? Will beagled eat inordinate amounts of RAM only during indexing, or always?

I only have 512 megs of RAM in this laptop; I do not want to sacrifice 60% of that to beagled. Likewise, 1 gig out of a measly 40 gig disk feels like a lot to pay for some convenience. I think I shall go back to locate + recursive grep for now. Or disable blanket indexing and only ask Beagle to index a few subdirs. I don't really need an index on all those Zope 3 and SchoolTool source trees.