我希望有人可以帮我解决我遇到的问题。我大约一年前编写了一个脚本,它解析收到的电子邮件并将数据存储在数据库中。
我通过标题收到电子邮件:
-------- Forwarded Message --------
Subject: FS.G02 Fleet Street - j** associates (AG69)
Date: Thu, 14 Apr 2016 11:27:32 +0000
From: Stephanie Zo*****ou <Stephanie.Zo****ou@********.co.uk>
To: 'lucien@********.com' <lucien@********.com>
我使用以下正则表达式和PHP代码将各种数据分开($ text包含上面的电子邮件字符串):
//Set RegEx to parse data out of text/plain email string
$re1 = '~(?<=From: )(.*?)(?: \<)(.*?)(?=\>)~';
$re2 = "~(?<=To: ').*(?=')~";
$re3 = "~(?<=Sent: ).*(?=)~";
$re4 = "~(?<=Subject: ).*(?=)~";
$re5 = "~(?<=Subject:\s)(.*?)(?=\s)(?:.*\s\-\s)(.*)~";
$re6 = "~\((.*?)\)~";
//Pull the data out using above expressions
if(preg_match($re1, $text, $matches1)) {
$from_name = $matches1[1];
$from_email = $matches1[2];
}
if(preg_match($re2, $text, $matches2))
$to_email = $matches2[0];
if(preg_match($re3, $text, $matches3))
$sent_date = $matches3[0];
if(preg_match($re4, $text, $matches4))
$subject_line = $matches4[0];
if(preg_match($re5, $text, $matches5)) {
$unit_code = $matches5[1];
$company_name = $matches5[2];
}
//Change sent date to timestamp
$sent_date = strtotime($sent_date);
//break the unit code and building code apart
$unit_code = explode('.',$unit_code,2);
$building_code = $unit_code[0];
$unit_code = $unit_code[1];
//break the (C0D3) off the end of the company / subject line
$company_name = preg_replace($re6,'' ,$company_name);
我试图分开的数据,以便我可以存储在DB中:
我的问题是脚本已停止正常工作。我的RegEx没有给我时间戳,也没有分解它的组成部分的主题:
FS.G02 Fleet Street - j ** associates(AG69)
开头的代码是我需要的一个数据。然后我把它分成前两个字母,然后是结果字母数字的后半部分。
FS.G02舰队街 - j 伙伴**(AG69)
我需要的第二部分总是在连字符之后 - 它是公司/客户名称。
自从我上次工作以来,这个格式没有变化,所以我不知道我是否打破了RegEx。有没有比RegEx更多经验的人能够看到我哪里出错了?
非常感谢, 乔纳森
答案 0 :(得分:1)
您是否尝试过使用imap_rfc822_parse_headers()
(Docs)而不是使用正则表达式?这肯定会让事情变得更简单。
object(stdClass)#1 (12) {
["date"]=> string(31) "Thu, 14 Apr 2016 11:27:32 +0000"
["Date"]=> string(31) "Thu, 14 Apr 2016 11:27:32 +0000"
["subject"]=> string(43) "FS.G02 Fleet Street - j** associates (AG69)"
["Subject"]=> string(43) "FS.G02 Fleet Street - j** associates (AG69)"
["toaddress"]=> string(69) "'lucien@********.com', UNEXPECTED_DATA_AFTER_ADDRESS@".SYNTAX-ERROR.""
["to"]=> array(2) {
[0]=> object(stdClass)#2 (2) {
["mailbox"]=> string(7) "'lucien"
["host"]=> string(13) "********.com'"
}
[1]=> object(stdClass)#3 (2) {
["mailbox"]=> string(29) "UNEXPECTED_DATA_AFTER_ADDRESS"
["host"]=> string(14) ".SYNTAX-ERROR."
}
}
["fromaddress"]=> string(55) "Stephanie Zo*****ou "
["from"]=> array(1) {
[0]=> object(stdClass)#4 (3) {
["personal"]=> string(19) "Stephanie Zo*****ou"
["mailbox"]=> string(18) "Stephanie.Zo****ou"
["host"]=> string(14) "********.co.uk"
}
}
["reply_toaddress"]=> string(55) "Stephanie Zo*****ou "
["reply_to"]=> array(1) {
[0]=> object(stdClass)#5 (3) {
["personal"]=> string(19) "Stephanie Zo*****ou"
["mailbox"]=> string(18) "Stephanie.Zo****ou"
["host"]=> string(14) "********.co.uk"
}
}
["senderaddress"]=> string(55) "Stephanie Zo*****ou "
["sender"]=> array(1) {
[0]=> object(stdClass)#6 (3) {
["personal"]=> string(19) "Stephanie Zo*****ou"
["mailbox"]=> string(18) "Stephanie.Zo****ou"
["host"]=> string(14) "********.co.uk"
}
}
}
以下是您的主题行的正则表达式:
([A-Z0-9]*\.[A-Z0-9]*)\s([A-Za-z\s]*)\s-\s([A-Za-z\s]*)\s(\([A-Z0-9]*\))
使用preg_match()
调用时,例如:
$output = [];
$input = "FS.G02 Fleet Street - Something associates (AG69)";
preg_match("/([A-Z0-9]*\.[A-Z0-9]*)\s([A-Za-z\s]*)\s-\s([A-Za-z\s]*)\s(\([A-Z0-9]*\))/", $input, $output);
您将收到如下内容:
array(
0 => "FS.G02 Fleet Street - Something associates (AG69)",
1 => "FS.G02",
2 => "Fleet Street",
3 => "Something associates",
4 => "(AG69)"
)