我想在php中使用正则表达式从以下链邮件中提取正文部分。 链邮件以txt格式保存。提取时,如果存在于body标签中的html标签应该不受影响。
$content = <<<HEREDOC
From: Matrimony <matrimony@mangalsutrabandhan.in>
Sent: Fri, 12 Aug 2011 16:17:40
To: "matrimony@mangalsutrabandhan.com" <matrimony@mangalsutrabandhan.in>
Subject: Re: bride search
From: brides <sales@mangalsutrabandhan.com>
Sent: Fri, 12 Aug 2011 15:49:52
To: "Matrimony " <matrimony@mangalsutrabandhan.in>
Cc: "groom" <brides@mangalsutrabandhan.com>
Subject: Re: bride search
PFA
Regds.,
sales
From: shaadi <kundaali@mangalsutrabandhan.in>
Sent: Tue, 22 Feb 2011 16:40:24
To: <vivaah@mangalsutrabandhan.com>, <bandhan@mangalsutrabandhan.com>
Cc: "'lagna '" <lagna@mangalsutrabandhan.in>, <movies@mangalsutrabandhan.in>, <manishv@mangalsutrabandhan.com>, "'beta data'" <channel@mangalsutrabandhan.com>, "'test S'" <city@mangalsutrabandhan.com>
Subject: Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.
Activity Timing: 9:00 PM onwards
Thanks and Regards,
free matrimony
shaadi Operations
P Please do not print this e-mail unless it is absolutely necessary
From: shaadi [nikaah:kundaali@mangalsutrabandhan.in]
Sent: 21 February 2011 23:09
To: vivaah@mangalsutrabandhan.com; bandhan@mangalsutrabandhan.com
Cc: 'lagna '; movies@mangalsutrabandhan.in; manishv@mangalsutrabandhan.com;
Subject: data transfer would be made live for 145 test
Hi,
gtsdhsdbh
anbdsmbsa
sda the data test .
Would request you to send in your feedback.
Thanks and Regards,
beta data
assa xyz
P Please do not print this e-mail unless it is absolutely necessary
HEREDOC;
O / P
Array
(
[0] => Array
(
[0] => Re: bride search
[1] => Re: bride search
PFA
Regds.,
sales
[2] => Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.
Activity Timing: 9:00 PM onwards
Thanks and Regards,
free matrimony
shaadi Operations
P Please do not print this e-mail unless it is absolutely necessary
)
[1] => Array
(
[0] => Re: bride search
[1] => Re: bride search
PFA
Regds.,
sales
[2] => Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.
Activity Timing: 9:00 PM onwards
Thanks and Regards,
free matrimony
shaadi Operations
P Please do not print this e-mail unless it is absolutely necessary
)
)
正则表达式我曾经超越o / p
preg_match_all('/(?<=Subject: )(.*?[\n][\s]*?)(?=From:)/is',$content,$rest);
但它没有给出最后一个,因为它没有'from'来获取中间数据。 希望清楚。 如果还有其他任何方法,请告诉我。
preg_match_all('/(?m:^From:\x20(?<From>[^\n]*)\n^Sent:\x20(?<Sent>[^\n]*)\n^To:\x20(?<To>[^\n]*)\n(?:^Cc:\x20(?<Cc>[^\n]*)\n)?^Subject:\x20(?<Subject>[^\n]*)\n)(?<Body>.*?(?=(?:\nFrom:)|$))/s',$content,$matches);
echo "<pre>".print_r($matches,true);
它提供了几乎正确的o / p.Should我在http://www.mangalsutrabandhan.com提供文本文件
答案 0 :(得分:0)
你需要一些更聪明的解析才能理解这一点 - 无论是什么产生这个文件都会改变电子邮件的结构:
Subject: Re: bride search
PFA
在看起来像电子邮件标题的一部分和它的正文之间应该至少有一个空行。
然后你遇到了top-posting的问题(你不能在不知道时区的情况下依赖标题中的时间戳),不完整的标题和no quoting。
因此,即使你构建了一个解析这个的启发式方法,也有太多的场景无法应对。