正如标题所示,我试图解析一个大的MBOX文件(一个文件中有16,000封电子邮件),而我正在处理一个小文件进行测试。
到目前为止,我的PHP是:
$string = file_get_contents("test.mbox");
$matches = array(); //create array
$patt = '/name:\s([^\r]+)|email:\s([^\r]+)/';
preg_match_all($patt, $string, $matches); //find matching pattern
print_r($matches);
$fp = fopen('test.csv', 'w');
foreach ($matches as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
但我的输出需要采用易于导入的格式。 目前我的正则表达式返回:
Array (
[0] => Array (
[0] => name: Andrew
[1] => email: andrew@gmail.com
[2] => name: Second Dude
[3] => email: second@gmail.com.au
[4] => name: Stuart Richards
[5] => email: stuart@gmail.com
[6] => name: Stuart Richards2
[7] => email: stuart2@gmail.com
[8] => name: Stuart Richards3
[9] => email: stuart3@gmail.com )
[1] => Array (
[0] => Andrew
[1] =>
[2] => Second Dude
[3] =>
[4] => Stuart Richards
[5] =>
[6] => Stuart Richards
[7] =>
[8] => Stuart Richards
[9] => )
[2] => Array (
[0] =>
[1] => andrew@gmail.com
[2] =>
[3] => second@gmail.com.au
[4] =>
[5] => stuart@gmail.com
[6] =>
[7] => stuart2@gmail.com
[8] =>
[9] => stuart3@gmail.com ) )
这是我想要的数据,但是以CSV形式显示(就像每个字段的交叉表查询一样)。 顶行包含一个字符串,如“name:Andrew,email:andrew@gmail.com等 csv中的第二行只包含名称:“Andrew ,, Second Dude ,,”等,它们在每一列都匹配。 第三部分仅包含电子邮件:“,andrew @ gmail.com,second @ gmail.com ,,
我有16000封电子邮件,其中包含名称:电子邮件:以及其中的两个其他标题,并且希望能够轻松导入到我的数据库中,因此我需要一个包含每个数据的csv: NAME1,EMAIL1,PHONE1 NAME2,EMAIL2,PHONE2 NAME3,EMAIL3,电话3
有人可以帮帮我吗?我已经尝试了很多东西,包括在交叉表格式输出但没有运气的情况下处理文件。 我尝试在每个正则表达式之后添加一个换行符,但没有运气。
我只是在周末刚开始使用php并且已经使用了这个网站很多!所以如果你能指出我正确的方向来学习我想做的语法,我将不胜感激。我已经达到了点击我已经阅读了十次的资源链接的地步,所以我想请求一些帮助。 干杯 安德鲁
我的测试mbox文件示例:
---------- Forwarded message ----------
From:
Date: Sat, Jan 3, 2015 at 9:38 AM
Subject: campaign Campaign (.INFO)
To:
Visitor's IP: 58.165.117.
name: Andrew Cowley
suburb: Victoria point
email: andrew@gmail.com
phone: 04035752
powerbill: $500
System_Required:
Date:Sat-Jan-2015 10:38:00
Key:
from: - landing page
---------- Forwarded message ----------
From:
Date: Sat, Jan 3, 2015 at 9:38 AM
Subject: campaign Campaign (.INFO)
To:
Visitor's IP: 58.165.117.
name: Second Dude
suburb: Victoria point
email: second@gmail.com.au
phone: 04035752
powerbill: $500
System_Required: 3kW
Date:Sat-Jan-2015 10:38:00
Key:
from: Adwords - landing page
---------- Forwarded message ----------
From:
Date: Sat, Jan 3, 2015 at 9:38 AM
Subject: campaign Campaign (.INFO)
To:
Visitor's IP: 58.165.117.
name: Stuart Richards
suburb: Victoria point
email: mottu@gmail.com
phone: 04035752
powerbill: $500
System_Required: 3kW
Date:Sat-Jan-2015 10:38:00
Key:
from: Adwords - landing page
答案 0 :(得分:0)
一个想法是构建一个模式,在一个匹配中提取两个信息(名称和电子邮件),并使用选项PREG_SET_ORDER
将匹配的所有信息(整个匹配和捕获组)放入相同的结果数组项:
$pattern = '~
^name: \h* (?<name> [^\r\n]+ ) \R
.* \R # skip the suburb line
email: \h* (?<mail> [^\r\n]+ )
~mx';
if (preg_match_all($pattern, $mbox, $m, PREG_SET_ORDER)) {
foreach($m as $item) {
echo $item['name'] . ',' . $item['mail'] . PHP_EOL;
}
}
词汇表:
\R # any kind of newlines
\h # an horizontal whitespace
(?<blah>...) # a named capture
^ # is by default an anchor for the start of the string,
# but when the m modifier is used, it becomes an anchor
# for the start of the line.
m modifier # change the meaning of ^ and $
x modifier # switch on the free spacing mode (or comment mode, or verbose mode)
您可以轻松更改此模式以添加其他字段。
注意:如果邮件或名称有时可能为空,您可以将[^\r\n]+
更改为[^\r\n]*
。
您可以使用foreach循环在数据库中插入值,而不是构建csv。我建议在循环之前启动一个事务:
if (preg_match_all($pattern, $mbox, $m, PREG_SET_ORDER)) {
try {
$dbh = new PDO("mysql:host=$hostname;dbname=$dbname", $usr, $pwd);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$dbh->beginTransaction();
$query = 'INSERT INTO myTable (name, mail) VALUES (?, ?)';
$sth = $dbh->prepare($query);
foreach($m as $item) {
$sth->execute(array($item['name'], $item['mail']));
}
$dbh->commit();
} catch(PDOException $e) {
$dbh->rollback();
echo "Error: " . $e->getMessage();
}
$dbh = null;
}