我有一个包含人员,电话号码,电子邮件地址列表的文件
例如
库塔
莎莉库特哈德
地点:萨里
涵盖的专业知识:马,狗,马和骑手
网站:www.veterinaryphysio.co.uk
电话:07865095005
电子邮件:sally@veterinaryphysio.co.uk
凯特海恩斯
地点:萨里,苏塞克斯,肯特
涵盖的专业知识:马,表演,马和骑手
电话:07957 344688
电子邮件:katehaynesphysio@yahoo.co.uk
列表如上所述,有数百个,我如何创建一个从上到下读取文件的正则表达式,并提取名字和姓氏行以及电子邮件地址,并将它们放在一起,如下所示
姓名,电子邮件地址
任何帮助都很棒
我有以下代码,但只读取电子邮件地址
$string = file_get_contents("physio.txt"); // Load text file contents
// don't need to preassign $matches, it's created dynamically
// this regex handles more email address formats like a+b@google.com.sg, and the i makes it case insensitive
$pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
// preg_match_all returns an associative array
preg_match_all($pattern, $string, $matches);
// the data you want is in $matches[0], dump it with var_export() to see it
echo "<pre>";
$input = $matches[0];
echo count($input);
echo "<br>";
$result = array_unique($input);
echo count($result);
echo "<br>";
//print_r($result);
echo "</pre>";
答案 0 :(得分:1)
正则表达式似乎是一种解析这些数据的明智方法。重要的是要放入足够的组件以保持匹配准确。
我建议如下:
模式:~^(.+)\RLocation:[\s\S]*?^Email: (\S*)~m
(Demo)
附近的子字符串Location:
和Email:
用于确保定位正确的子字符串。
m
模式修饰符用于通过匹配行开头的^
字符(而不仅仅是字符串的开头)来提高模式的准确性。
细分:
~ #pattern delimiter
^ #match start of a line
(.+) #capture one or more non-newline characters (Capture Group #1)
\R #match a newline character (\r, \n, \r\n)
Location: #match literal: "Location" followed by colon
[\s\S]*? #match (lazily) zero or more of any character
^Email: #match start of a line, literal: "Email", colon, space
(\S*) #capture zero or more visible characters (Capture Group #2 -- quantifier means the email value can be blank and still valid)
~ #pattern delimiter
m #pattern modifier tells regex engine that ^ means start of a line instead of start of the string
代码:(Demo)
$input = "Coulthard
Sally Coulthard
Location: Surrey
Expertise Covered: Horse, Dog, Horse and Rider
Website: www.veterinaryphysio.co.uk
Tel: 07865095005
Email: sally@veterinaryphysio.co.uk
Kate Haynes
Location: Surrey, Sussex, Kent
Expertise Covered: Horse, Performance, Horse and Rider
Tel: 07957 344688
Email: katehaynesphysio@yahoo.co.uk";
if (preg_match_all("~^(.+)\RLocation:[\s\S]*?^Email: (\S*)~m", $input, $matches, PREG_SET_ORDER)) {
foreach ($matches as $data) {
echo "{$data[1]}, {$data[2]}\n";
}
}
输出:
Sally Coulthard, sally@veterinaryphysio.co.uk
Kate Haynes, katehaynesphysio@yahoo.co.uk
答案 1 :(得分:0)
您可以通过双重换行拆分内容,然后处理每个块。要获取名字和姓氏,您可以获得不包含": "
的最后一行:
$blocks = explode("\n\n", $string);
foreach ($blocks as $block) {
$lines = explode("\n", $block);
$mail = end($lines);
$mail = substr($mail, strlen('Email: '));
$lines = array_reverse($lines);
$fnln = '';
foreach ($lines as $line) {
if (strpos($line, ': ') === false) {
$fnln = $line;
break;
}
}
echo $fnln . ", " . $mail . "<br>";
}
输出:
Sally Coulthard, sally@veterinaryphysio.co.uk
Kate Haynes, katehaynesphysio@yahoo.co.uk
或者,如果电子邮件并不总是块的最后一行;
$blocks = explode("\n\n", $string);
foreach ($blocks as $block) {
$lines = explode("\n", $block);
$lines = array_reverse($lines);
$fnln = '';
foreach ($lines as $line) {
if (substr($line, 0, 6) == 'Email:') {
$mail = substr($line, 7);
}
if (strpos($line, ': ') === false) {
$fnln = $line;
break;
}
}
echo $fnln . ", " . $mail . "<br>";
}