Tuesday, 2006-01-31

Converting from ISO-8859-1 to UTF-8 in Perl

When posting my observations via email any Swedish characters are converted to quoted-printable ISO-8859-1 by Gmail. However, this blog is in UTF-8. This is how I translated the input from the mail message.

#!/usr/bin/perl -w
use strict;
use MIME::QuotedPrint qw( decode_qp );
use Encode qw( decode encode );

# split the mail message
my ( $headers, $body );
{
    local $/ = undef;
    ( $headers, $body ) = split( "\n\n", <STDIN>, 2 );
}

# decode the qouted-printable input
$body = decode_qp( $body );
# decode to Perl's internal format
$body = decode( 'iso-8859-1', $body );
# encode to UTF-8
$body = encode( 'utf-8', $body );

print $body, "\n";

The result is piped into a second script that formats the actual posting.

Pretty basic, eh? But until you know how, it can be a bit frustrating getting this to work.