我正在尝试抓取嵌入式电子邮件中的图片。问题是我保存的图像是不可读的,我无法弄清楚原因。 电子邮件(保存为我在代码开头加载的文件):
MIME-Version: 1.0
Received: by 10.100.120.7 with HTTP; Tue, 18 Oct 2011 10:36:48 -0700 (PDT)
In-Reply-To: <8B4FDE07A4759840B84FD04B4C88100B010135E81D8C@fxildc03.forexmanage.com>
References: <8B4FDE07A4759840B84FD04B4C88100B010135E81D8C@fxildc03.forexmanage.com>
Date: Tue, 18 Oct 2011 19:36:48 +0200
Delivered-To: s.shpiz@gmail.com
Message-ID: <CAEb-As9XVmciajFAwEaFyF8CE4QG0t-Z5zFDDpMWXLqaBur1sA@mail.gmail.com>
Subject: openme
From: Simeon Shpiz <s.shpiz@gmail.com>
To: me <s.shpiz@gmail.com>
Content-Type: multipart/related; boundary=001636c5977303b92404af962ba6
--001636c5977303b92404af962ba6
Content-Type: multipart/alternative; boundary=001636c5977303b91d04af962ba5
--001636c5977303b91d04af962ba5
Content-Type: text/plain; charset=ISO-8859-1
****
--001636c5977303b91d04af962ba5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_quote"><div lang=3D"EN-US" link=3D"blu=
e" vlink=3D"purple"><div><p class=3D"MsoNormal"><span style=3D"font-size:11=
.0pt;color:#1F497D"><img width=3D"15" height=3D"13" src=3D"cid:image003.png=
@01CC8DCD.30A2A7C0"></span><span style=3D"font-size:11.0pt;color:#1F497D"><=
u></u><u></u></span></p>
</div>
</div></div><br></div>
--001636c5977303b91d04af962ba5--
--001636c5977303b92404af962ba6
Content-Type: image/png; name="image003.png"
Content-Transfer-Encoding: base64
Content-ID: <image003.png@01CC8DCD.30A2A7C0>
X-Attachment-Id: 3e79c375acccec3d_0.1
iVBORw0KGgoAAAANSUhEUgAAAA4AAAANCAIAAAAWvsgoAAAAAXNSR0IArs4c6QAAAAlwSFlzAAAO
yAAADsMBrahYpwAAAItJREFUKFNj/P//PwNxgIk4ZWBVQFOBoBsMsGqrqqr6CgYsaNIPHz6EiMjJ
yb19+xbISE9PLy4uBjLQlSLrFBYWBnITExN9fHyADMJulZCQgOgnrFRUVJRYpXAnETb19evXxJr6
4sULiFJ8IfDt2zegii1btmRkZGBRKi8vjxbSwKjJysoCCjISnwYATtwwhahioZoAAAAASUVORK5C
YII=
--001636c5977303b92404af962ba6--
我正在使用的python代码:
import email
from BeautifulSoup import BeautifulSoup
message = email.message_from_file(open(r'C:\shpiz\test\msg\12248'))
cid_list = []
images = []
for part in message.walk():
if str(part.get_content_type()) == 'text/html':
soup = BeautifulSoup(part.get_payload(decode=True))
cid = '<%s>'%soup('img')[0]['src'][4:]
cid_list.append(cid)
for part in message.walk():
if part.get('Content-ID') in cid_list :
images.append((part.get_filename(),part.get_payload(decode=True)))
for name, image in images:
with open(r'c:\shpiz\test\%s'%name,'w') as f:
f.write(image)
遗憾的是,保存的图像效果不佳。 (没有程序打开它。)
我用notepad ++查看了原始图像文件和新图像文件并且存在差异 - 看起来有一个换行符,我生成的副本不存在于原始文件中。这并不是唯一的区别,因为删除记事本++中的行不会使图像可打开。我所描述的差异可以看作here
非常感谢您在找到问题方面提供的帮助。
答案 0 :(得分:2)
您正在以文本模式编写图像,Python正在破坏行结尾。以wb
模式打开它,逐字写入。
答案 1 :(得分:0)
问题在于:
for name, image in images:
with open(r'c:\shpiz\test\%s'%name,'w') as f:
f.write(image)
默认情况下,使用open创建的文件是文本文件。你必须和'w'一起使用'b'。但我不知道这是否能解决整个问题。您可能还需要专门的图形文件读取器/写入器。