我可以轻松连接到IMAP邮件服务器:
use Mail::IMAPClient;
use MIME::Base64;
use MIME::Parser;
my $imap = Mail::IMAPClient->new(
Server => '192.168.2.2',
User => 'xxxxxx',
Password => 'yyyyyy',
Ssl => 1,
Uid => 1,
);
my $folders = $imap->folders
or die "List folders error: ", $imap->LastError, "\n";
print "Folders: @$folders\n";
$sfolder="INBOX.2012";
$imap->select( $sfolder )
or die "Select '$Opt{sfolder}' error: ", $imap->LastError, "\n";
my @msgs = $imap->messages or die "Could not messages: $@\n";
然而,由于这样的代码,我想要的文本和html不容易解析:
Content-Transfer-Encoding:base64
Content-Type:text/html; charset=utf-8
Content-Transfer-Encoding:base64
Content-Type:text/html; charset=utf-8
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----------=_4F0F4830.7079357A"
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----=_Part_4487195_1184536749.1326753403034"
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/alternative; boundary=--boundary_164442_d184e417-739f-
46d6-824a-6ea1846e79de
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----=_Part_3882878_23916831.1326509484032"
Multipart
Content-Transfer-Encoding:
我试过这个,但它只适用于少数不同的编码。
if ($imap->get_header($msg,"Content-Transfer-Encoding")=~ /base64/i) {
print "\nMatch base64";
if ($imap->get_header($msg,"Content-Type")=~m/text/i ) {
push(@mail,decode_base64($imap->body_string($msg)));
}
elsif ($imap->get_header($msg,"Content-Type")=~m/image/i )
{ print "\nImage detected"; }
elsif ($imap->get_header($msg,"Content-Type")=~m/application/i )
{ print "\nApplication detected"; }
有7bit和8bit变体以及其他编码方法,其中包含我希望以后使用的 html或text 。我成功使用decode_base64()来解码base64。要解码的更糟糕的是包含多部分代码的那些。我觉得我正在重新发明轮子,必须有一个图书馆或模块,可以为我做所有繁重的工作。 其他内容类型(如.jpg,.gif和.pdf)应该被忽略。多部分电子邮件包含至少一部分我感兴趣的部分,但很多部分对我没用。
经过进一步的研究,这个结构有一些我需要的信息,但不知道如何有效地解决它是另一回事。
Dumping:$VAR1 = bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_002_BC64_7D688C1F.A2FF9BE0'
},
'bodyextra' => undef,
'_top' => 1,
'bodydisp' => 'NIL',
'_id' => 'HEAD',
'bodysubtype' => 'mixed',
'PartsIndex' => {
'1.3' => bless( {
'bodyparms' => 'NIL',
'bodyid' => '<d9e26cc0-019c-4ac0-9b1e-9c9ac8424f52>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.3',
'bodysubtype' => 'jpeg',
'_prefix' => '1.3',
'bodysize' => '4808',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.1' => bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_000_36AE_880DDD08.0A776E35'
},
'bodyextra' => undef,
'bodydisp' => 'NIL',
'_id' => '1.1',
'bodysubtype' => 'alternative',
'_prefix' => '1.1',
'bodytype' => 'MULTIPART',
'bodystructure' => [
bless( {
'bodyparms' => {
'charset' => 'utf-8'
},
'bodyextra' => 'NIL',
'bodyid' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.1.1',
'bodysubtype' => 'PLAIN',
'_prefix' => '1.1.1',
'bodysize' => '1971',
'bodytype' => 'TEXT',
'bodyMD5' => 'NIL',
'textlines' => '74',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'quoted-printable'
}, 'Mail::IMAPClient::BodyStructure' ),
bless( {
'bodyparms' => {
'charset' => 'utf-8'
},
'bodyextra' => 'NIL',
'bodyid' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.1.2',
'bodysubtype' => 'HTML',
'_prefix' => '1.1.2',
'bodysize' => '23364',
'bodytype' => 'TEXT',
'bodyMD5' => 'NIL',
'textlines' => '331',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'quoted-printable'
}, 'Mail::IMAPClient::BodyStructure' )
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' ),
'1' => bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_001_EA96_2BF8DEDE.32622D51'
},
'bodyextra' => undef,
'bodydisp' => 'NIL',
'_id' => 1,
'bodysubtype' => 'related',
'_prefix' => 1,
'bodytype' => 'MULTIPART',
'bodystructure' => [
$VAR1->{'PartsIndex'}{'1.1'},
bless( {
'bodyparms' => 'NIL',
'bodyid' => '<5dff39db-e81c-4410-be75-8662564fd328>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.2',
'bodysubtype' => 'jpeg',
'_prefix' => '1.2',
'bodysize' => '14406',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
$VAR1->{'PartsIndex'}{'1.3'},
bless( {
'bodyparms' => 'NIL',
'bodyid' => '<717f2ef4-f795-4d1c-87cc-283c9b0a59b0>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.4',
'bodysubtype' => 'gif',
'_prefix' => '1.4',
'bodysize' => '2912',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' )
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.2' => $VAR1->{'PartsIndex'}{'1'}{'bodystructure'}[1],
'1.1.2' => $VAR1->{'PartsIndex'}{'1.1'}{'bodystructure'}[1],
'2' => bless( {
'bodyparms' => {
'name' => 'BKD-7361945220.pdf'
},
'bodyid' => 'NIL',
'bodyextra' => 'NIL',
'bodydisp' => {
'attachment' => {
'filename' => 'BKD-7361945220.pdf'
}
},
'_id' => 2,
'bodysubtype' => 'octetstream',
'_prefix' => 2,
'bodysize' => '47540',
'bodytype' => 'application',
'bodyMD5' => 'NIL',
'bodystructure' => [],
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.4' => $VAR1->{'PartsIndex'}{'1'}{'bodystructure'}[3],
'1.1.1' => $VAR1->{'PartsIndex'}{'1.1'}{'bodystructure'}[0]
},
'_prefix' => 'HEAD',
'PartsList' => [
1,
'1.1',
'1.1.1',
'1.1.2',
'1.2',
'1.3',
'1.4',
2
],
'bodytype' => 'MULTIPART',
'bodystructure' => [
$VAR1->{'PartsIndex'}{'1'},
$VAR1->{'PartsIndex'}{'2'}
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' );
正如您所看到的那样,没有任何值可以保证是PartsIndex上每个部分的一部分,有些值是嵌套的。
每个PartsIndex项感兴趣的变量: 体型 bodysubtype bodyenc
答案 0 :(得分:1)
使用Courriel解析邮件:
use strictures;
use Mail::IMAPClient qw();
use Courriel qw();
sub walk_parts {
my ($obj, $callback) = @_;
if ($obj->is_multipart) {
for my $part ($obj->parts) {
walk_parts($part, $callback);
}
} else {
$callback->($obj);
}
}
my $imap = Mail::IMAPClient->new(
…
) or die $@;
my $folders = $imap->folders
or die $imap->LastError;
$imap->select('INBOX')
or die $imap->LastError;
my @messages = $imap->messages
or die $imap->LastError;
for my $id (@messages) {
my $raw = $imap->message_string($id)
or die $imap->LastError;
my $email = Courriel->parse(text => $raw);
walk_parts $email, sub {
my ($part) = @_;
my $content = $part->content;
my $type = $part->mime_type;
}
}
答案 1 :(得分:1)
我尝试使用一些预先构建的模块,但它们有太多的依赖项并且难以使用。此解决方案不会添加原始之外的依赖项。我也遇到了libMagic依赖关系的问题,见上文,我也不希望任何使用我程序的人也必须处理这个问题。
您必须为主父母一次调用解码两次,并为每个孩子再次调用解码。由于这个$ imap-&gt; get_bodystructure($ msg);包含您需要的所有信息,以便添加不需要的依赖项。花了很多时间才弄清楚如何手动解码它,但这是值得的。
您可以将所需的解码器添加到decode()子例程中。我只需解码那里的text / html和base64编码版本。 IMAPClient函数为您提供所有父母和孩子的列表,因此您不必自己制作列表。棘手的部分是你可以拥有任意数量的父母,每个父母都有许多孩子,但只有孩子包含有用的数据。父母可以被忽略,因为他们的许多值都是空白,undef或'NIL'(字面意思)。实际上,大量变量的值为“NIL”。即使是电子邮件客户端可以为用户回答的内容,例如bodyMD5和bodylang,也通常等于'NIL'。由于压倒性地使用'NIL'解析和使用其他字段可能证明是徒劳的。取决于你的imap服务器和收到你的电子邮件的人可能会有所不同。
如果您还有其他问题,请发表评论。
use Mail::IMAPClient;
use MIME::Base64;
use MIME::Parser;
sub decode {
($process, $imap) =@_;
if ($process->bodytype eq "TEXT") {
print "\n Text SubType:".$process->bodysubtype;
if ($process->bodyenc eq "base64") {
return decode_base64($imap->bodypart_string($msg,$process->id));
}
elsif (index(" -7bit- -8bit- -quoted-printable- ",lc($process->bodyenc)) !=-1 ) {
return $imap->bodypart_string($msg,$process->id);
}
print "\n==========Insert new decoder here============";
print "\n".$imap->bodypart_string($msg,$process->id);
print "\n=================================================";
}
return "";
}
#insert your login code with credentials here
$imap->select( $sfolder )
or die "Select '$Opt{sfolder}' error: ", $imap->LastError, "\n";
my @msgs = $imap->messages or die "Could not messages: $@\n";
foreach $msg (@msgs) {
my $raw = $imap->message_string($msg)
or die $imap->LastError;
$struct = $imap->get_bodystructure($msg);
#MULTIPART is a container designation and does not contain anything useful by itself.
#However it will still process all of the children that have content
if ($struct->bodytype ne "MULTIPART") { print "\n BodyEnc:".$struct->bodyenc();}
$rDecode=decode($struct,$imap);
#do not insert blanks.
if ($rDecode ne "" && (length($rDecode)>2)) {push(@mail,$rDecode); }
foreach $dumpme ($struct->bodystructure()) {
if ($dumpme->bodytype() eq "MULTIPART") {next;}
$rDecode="";
$rDecode=decode($dumpme,$imap);
#do not insert blanks.
if (($rDecode ne "") && (length($rDecode)>2) ) {
push(@mail,$rDecode); }
}
}
答案 2 :(得分:0)
您需要一个MIME解析器。不幸的是,即使这样,您也需要对自己进行一些规范化,因为有多种方法可以在MIME中表示相同的数据。