Latin-1 or Windows-1252? - Random notes from mg

Michael Foord wrote about some Latin-1 control character fun in a blog that's hard to read (the RSS feed syndicated on Planet Python is truncated, grr!) and hard to reply (~~no comments on the blog!~~ my Chromium's AdBlock+ hid the comment link so I couldn't find it), but never mind that.

Unfortunately the data from the customers included some \x85 characters, which were breaking the CSV parsing.

0x85 is a control character (NEXT LINE or NEL) in Latin-1, but it's a printable character (HORIZONTAL ELLIPSIS) in Microsoft's code page 1252, which is often mistaken for Latin-1. I would venture a suggestion that the encoding of the customer data was not latin-1 but rather cp1252.

>>> '\x85'.decode('cp1252')
u'\u2026'