Random notes from mg

a blog by Marius Gedminas

Marius is a Python hacker. He works for Programmers of Vilnius, a small Python/Zope 3 startup. He has a personal home page at http://gedmin.as. His email is marius@gedmin.as. He does not like spam, but is not afraid of it.

Sat, 02 Oct 2004

Sending Unicode emails in Python

Sending a properly encoded email that contains non-ASCII characters is not as trivial as it should be. Here's more or less what I want:

# U+263A and U+263B are smiley faces (☺ and ☻)
sender = u'Sender \u263A <sender@example.com>'
recipient = u'Recipient \u263B <recipient@example.com>'
subject = u'Smile! \u263A'
body = u'Smile!\n\u263B'
send_email(sender, recipient, subject, body)

The hard part is getting all the unicode strings to be properly encoded in the email. Details like multiple recipients, additional headers, attachments, SMTP configuration and error handling are ignored for the purposes of this article.

Here's the solution:

from smtplib import SMTP
from email.MIMEText import MIMEText
from email.Header import Header
from email.Utils import parseaddr, formataddr

def send_email(sender, recipient, subject, body):
    """Send an email.

    All arguments should be Unicode strings (plain ASCII works as well).

    Only the real name part of sender and recipient addresses may contain
    non-ASCII characters.

    The email will be properly MIME encoded and delivered though SMTP to
    localhost port 25.  This is easy to change if you want something different.

    The charset of the email will be the first one out of US-ASCII, ISO-8859-1
    and UTF-8 that can represent all the characters occurring in the email.
    """

    # Header class is smart enough to try US-ASCII, then the charset we
    # provide, then fall back to UTF-8.
    header_charset = 'ISO-8859-1'

    # We must choose the body charset manually
    for body_charset in 'US-ASCII', 'ISO-8859-1', 'UTF-8':
        try:
            body.encode(body_charset)
        except UnicodeError:
            pass
        else:
            break

    # Split real name (which is optional) and email address parts
    sender_name, sender_addr = parseaddr(sender)
    recipient_name, recipient_addr = parseaddr(recipient)

    # We must always pass Unicode strings to Header, otherwise it will
    # use RFC 2047 encoding even on plain ASCII strings.
    sender_name = str(Header(unicode(sender_name), header_charset))
    recipient_name = str(Header(unicode(recipient_name), header_charset))

    # Make sure email addresses do not contain non-ASCII characters
    sender_addr = sender_addr.encode('ascii')
    recipient_addr = recipient_addr.encode('ascii')

    # Create the message ('plain' stands for Content-Type: text/plain)
    msg = MIMEText(body.encode(body_charset), 'plain', body_charset)
    msg['From'] = formataddr((sender_name, sender_addr))
    msg['To'] = formataddr((recipient_name, recipient_addr))
    msg['Subject'] = Header(unicode(subject), header_charset)

    # Send the message via SMTP to localhost:25
    smtp = SMTP("localhost")
    smtp.sendmail(sender, recipient, msg.as_string())
    smtp.quit()

I wish I could write it like this:

from smtplib import SMTP
from email.MIMEText import MIMEText

def send_email(sender, recipient, subject, body):
    """Science-fictional simple version of send_email."""

    # The email module should be able to deal with Unicode message bodies and
    # headers and pick an appropriate charset automatically.  Today (on Python
    # 2.3) it just bombs out with an Unicode error when as_string() is called.
    msg = MIMEText(body)        # won't work
    msg['From'] = sender        # won't work
    msg['To'] = recipient       # won't work
    msg['Subject'] = subject    # won't work

    # At least the SMTP module is smart enough to discard the real name part
    # that it doesn't need
    smtp = SMTP("localhost")
    smtp.sendmail(sender, recipient, msg.as_string())
    smtp.quit()
posted at 02:29 | tags: | permanent link to this entry | 18 comments
Useful post.  I am pulling the email contents from XML and your snippet above was just what I was looking for.  Thanks.
posted by James Sullivan at Sun May 24 09:32:05 2009
Thanks alot. your post helped me to format HTML message to Lotus Notes client :-)
posted by thinker at Fri Sep 4 16:44:02 2009
Thanks. Bullseye for what I needed.
posted by just someone at Wed Oct 7 10:22:27 2009
The email package is undergoing severe reworking these months to get Unicode-aware. Thanks to Python 3 for forcing programs to quit thinking 127 characters are enough.

Kind regards
posted by Merwok at Fri Nov 6 04:17:26 2009
Neįtikėtina! Veikia! (Visa paslaptis, pasirodo, – MIMEText() trečias argumentas.)
posted by br at Fri Jul 2 10:27:26 2010
Looks like Python 2.7/3.2 will make this easier: http://bugs.python.org/issue1368247

(I always felt mildly guilty for not finding enough round tuits to work on this upstream.)
posted by Marius Gedminas at Mon Jul 5 21:53:32 2010
Thank you so much! Very useful
posted by Dmitriy at Tue Jul 27 12:41:30 2010
I love you, body))!!!! Thanks a lot!!!
posted by laginarius at Thu Nov 11 01:15:07 2010
I shortened the code a bit and it worked like a charm, thank you!
posted by Seb at Mon Dec 13 22:47:54 2010
I was facing this very problem for a small mailer program with spanish characters in the message.

Your post was music to my ears and now the problem is solved.

Merry Xmass & thanks a lot

  Javier
posted by Javier Reyes at Mon Dec 27 21:20:47 2010
Thank you! Very useful snippet!
posted by Nikolay at Thu Jun 9 11:53:19 2011
Many thanks mg. Strangely in my case I had  to change these two lines:

header_charset = 'UTF-8'
msg['Subject'] = Header(unicode(subject,'UTF-8'), header_charset)

in order to get a proper email with utf-8 in the subject and body. (That was after I found out that I had forgoten to add the # -*- coding: utf-8 -*- line in my script which made me think that nothing will ever work)

(python 2.6.5)
posted by Nick Demou at Wed Oct 26 20:28:36 2011
Nick, it seems that in your case 'subject' was an 8-bit string (with non-ASCII characters) rather than the Unicode string my method expected.
posted by Marius Gedminas at Wed Oct 26 20:31:32 2011
you're right Marius. I was calling it like this:

send_email('mb-ndemou@...', 'mb-ndemou@...',  'Greek text follows τεστ', u'Smile and Greek text follows \u263A  τεστ')

(note that one string is a u'' string and one is a regular '' string)
I still don't have a good grasp of unicode issues.
posted by Nick Demou at Thu Oct 27 10:01:44 2011
Smooth!! Perfect!!

Thank you!!!
posted by Ricard at Sun Jun 10 00:50:56 2012
i use debian squeeze with python 2.6 and locale pt_BR utf-8. your code worked very well and using this http://segfault.in/2010/12/sending-gmail-from-python/ i could send python mails directly from my gmail account; header,sender,recipient i used ascii and subject and body in utf-8. very thanks for your code.
posted by gsavix at Wed Sep 26 09:07:34 2012
dekui. kaip pirshtu i aki - butent tai ko ieskojau
posted by tamosius at Tue Oct 2 23:35:51 2012
how to send attachment along with this mail
posted by COD at Wed Mar 19 08:07:57 2014

Name (required)


E-mail (will not be shown)


URL


Comment (some HTML allowed)