Removing spam from Mailman's queue - Random notes from mg

Mailman is a wonderful mailing list manager, but when you have thousands of spam messages sitting in the moderation queue, it's web interface is not enough.

The messages live as Python pickles on the file system, in the mailman data directory. The file name pattern is heldmsg-listname-number.pck. Newer versions of Mailman¹ come with a script discard that takes a list of path names on the command line and discards them all. In other words, to get rid of all held messages all you have to do is type

/usr/lib/mailman/bin/discard /var/lib/mailman/data/heldmsg*

(you may have to change the directory names to suit your mailman installation).

¹ Mailman 2.1.5 has the discard script, Mailman 2.1.2 doesn't.

However I want to be really sure that the messages I'm discarding are spam. The most straightforward way to do that is to extract the RFC 2822 messages from Mailman's pickles, and pipe them to spamassassin. I could not find a script for message extraction included with Mailman, so I had to write my own (mmextract.py):

#!/usr/bin/env python
"""
Extract an email message from a Mailman pickle.

Usage: mmextract.py filename > outputfile
"""
import sys
import cPickle
sys.path.insert(0, '/usr/lib/mailman') # you might need to change this

def main(argv=sys.argv):
    if len(argv) < 2:
        print __doc__
        return
    msg = cPickle.load(open(argv[1]))
    print msg.as_string()

if __name__ == '__main__':
    main()

The rest is a matter of simple shell scripting:

for fn in /var/lib/mailman/data/heldmsg*; do
    ./mmextract.py $fn | spamassassin -L -e > /dev/null || echo $fn
done | xargs /usr/lib/mailman/bin/discard

(untested, but it should work).