最近在搞邮局。有个很奇怪的问题,就是打开mbox的文件,比如说:

/var/spool/mail/root

里面信件的部分有奇怪的3D字符:

<table cellpadding=3D"0" cellspacing=3D"0" style=3D"text-align:le=  
ft;color:#454545;background-color:#fff;font-size:14px;border-radius:10px;pa=  

注意,中间多了若干个3D,最后也多了=号

这是什么鬼呢?

搜了一圈,原来这个是quoted-printable编解码,跟Base64类似,base64和quoted-printable这两种编码都是在电子邮件中常见的编码。

基本知识:

  1. 如果=号出现在一行最后,表示换行,那么:
    he=
    llo
    意思就是连起来的hello

  2. 如果中间出现=3D,那就是一个=号的意思
    所以style=3D"text"意思就是style="text"

  3. 英文字符除了=以外不做处理,其他字符的编码为=加这个字符的两个字节的16进制数。

弄明白了吧。

给一段处理mbox的python程序,可以用来读邮件:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import mailbox  
import base64  
import os  
import sys  
import email

import quopri


filename = "/var/spool/mail/zrr"

mb = mailbox.mbox(filename)  
nmes = len(mb)

for i in range(len(mb)):  
        print "\n\n\n\n\n"
        print "-------------------------------------------------------------------------------------------------"
    print "Email", i
        print "-------------------------------------------------------------------------------------------------"

    mes = mb.get_message(i)
    em = email.message_from_string(mes.as_string())

    subject = em.get('Subject')
    if subject.find('=?') != -1:
        ll = email.header.decode_header(subject)
        subject = ""
        for l in ll:
            subject = subject + l[0]

    em_from = em.get('From')
    if em_from.find('=?') != -1:
        ll = email.header.decode_header(em_from)
        em_from = ""
        for l in ll:
            em_from = em_from + l[0]

    print "From: %s - Subject: %s" %(em_from, subject)
        print "-------------------------------------------------------------------------------------------------"

        if mes.is_multipart():
            for part in mes.get_payload():
                print  quopri.decodestring(part.get_payload())
        else:
            print quopri.decodestring(mes.get_payload())
comments powered by Disqus