使用preg_match_all分隔文本正文中的标签和值列表

时间:2012-09-20 21:17:56

标签: php regex

我正在设置一个PHP脚本,该脚本将从维护服务台发送电子邮件。这些电子邮件是从我们的客户公司使用的网络表单发送的,我无法控制。电子邮件的格式是标准的,但包含一个列表,其中包含从Web表单提供的标签。我想使用正则表达式拆分此列表并将标签和值放入一个数组中,我可以将其输入到我自己的数据库中。我有一个工作解决方案,但我在正则表达式上非常新,我确信有更好/更有效的方法。

我可能收到的电子邮件示例:

Dear *MY COMPANY*,

A new job has been raised, please see details below.
If you are unable to action this job request, please notify the Maintenance Help Desk on xxx-xxxx as soon as possible.

    Job Type: Man In Van
    Job Code: 1462399
    Due Date: 27/09/2012 07:21:10
    Response Time: Man In Van
    Pub Number: 234
    Pub Name: pub name, location
    Pub Address: 123 somewhere, some place XX1 7XX
    Pub Post Code: XX1 7XX
    Pub Telephone Number: xxx xxxx
    Placed By: Ben
    Date/time placed: 20/09/2012 07:21:10
    Trade Type: Man In Van
    Description: List of jobs emailed by Chris, carried out by Martin Baker. No callout on system currently, although jobs already completed, just need signing off.


    For any queries, please either contact the pub directly, telephone the Maintenance Help Desk on xxx-xxxx or reply to this e-mail.

Many Thanks
*CLIENT COMPANY* 

它周围有更多的样板,显然是电子邮件标题等,但你明白了。每封电子邮件只包含一个列表,标签将保持不变,但我希望将来证明它,所以如果他们添加新字段我将不需要更改我的代码。我想最终得到一个数组,如:

$job['Job Type'] = Man in van
$job['Job Code'] = 1462399
...
$job['Description'] = List of all jobs emailed ... just need signing off.

虽然我可以确信格式不会改变,但每个表单都是用户输入的,因此可能无法预测,特别是描述,其中可能包含换行符。

这是我目前正在使用的代码:

// Rip out the job details from the email
preg_match_all('/job type\:.*description\:.*\s{3}F/is', $the_email, $jobs);

    for each job returned (should always be one but hey)
    foreach($jobs[0] as $job_details) {

// Get the variables from the job description
    preg_match_all('/(\w[^\:]*)\: ([\w\d][^\*]+)/i', $job_details, $the_vars);

}

    // For each row returned, put into an array with the first group as the key and the second as the value
for ($i=0; $i<count($the_vars[0]); $i++) {

    $arr[$the_vars[1][$i]] = $the_vars[2][$i];

}

它有效,但它很难看,我确信有更好的方法。我遇到的主要问题是描述部分,因为我不能简单地搜索“:”之后的文本注册直到换行符,因为描述本身可能包含换行符。

非常感谢任何建议!

1 个答案:

答案 0 :(得分:0)

仍然不是世界上最漂亮的东西,但它应该可以正常工作!

preg_match_all('/\s{3}[ ]*([^:]+): ([^\n]+)/', $subject, $matches);
$job = array_combine($matches[1], $matches[2]);

preg_match_all('/Description\: (.*)\s{3}For any queries/is', $subject, $match);
$job['Description'] = trim($match[1][0]);

第一个preg_match_all执行你所说的并不真正有用,只需用空格,冒号和换行符来抓取所有字段。

第二个替换了第一个填写的可能错误的Description键。