Question

我有一个包含人员，电话号码，电子邮件地址列表的文件

例如

库塔
莎莉库特哈德
地点：萨里
涵盖的专业知识：马，狗，马和骑手
网站：www.veterinaryphysio.co.uk
电话：07865095005
电子邮件：sally@veterinaryphysio.co.uk

凯特海恩斯地点：萨里，苏塞克斯，肯特
涵盖的专业知识：马，表演，马和骑手
电话：07957 344688
电子邮件：katehaynesphysio@yahoo.co.uk

列表如上所述，有数百个，我如何创建一个从上到下读取文件的正则表达式，并提取名字和姓氏行以及电子邮件地址，并将它们放在一起，如下所示

姓名，电子邮件地址

任何帮助都很棒

我有以下代码，但只读取电子邮件地址

$string = file_get_contents("physio.txt"); // Load text file contents

// don't need to preassign $matches, it's created dynamically

// this regex handles more email address formats like a+b@google.com.sg, and the i makes it case insensitive
$pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';

// preg_match_all returns an associative array
preg_match_all($pattern, $string, $matches);

// the data you want is in $matches[0], dump it with var_export() to see it
echo "<pre>";
$input = $matches[0];
echo count($input);
echo "<br>";
$result = array_unique($input);
echo count($result);
echo "<br>";
//print_r($result);
echo "</pre>";

Answer 1

正则表达式似乎是一种解析这些数据的明智方法。重要的是要放入足够的组件以保持匹配准确。

我建议如下：

模式：~^(.+)\RLocation:[\s\S]*?^Email: (\S*)~m（Demo）

附近的子字符串Location:和Email:用于确保定位正确的子字符串。

m模式修饰符用于通过匹配行开头的^字符（而不仅仅是字符串的开头）来提高模式的准确性。

细分：

~          #pattern delimiter
^          #match start of a line
(.+)       #capture one or more non-newline characters (Capture Group #1)
\R         #match a newline character (\r, \n, \r\n)
Location:  #match literal: "Location" followed by colon
[\s\S]*?   #match (lazily) zero or more of any character
^Email:    #match start of a line, literal: "Email", colon, space
(\S*)      #capture zero or more visible characters (Capture Group #2 -- quantifier means the email value can be blank and still valid)
~          #pattern delimiter
m          #pattern modifier tells regex engine that ^ means start of a line instead of start of the string

代码：（Demo）

$input = "Coulthard
Sally Coulthard
Location: Surrey
Expertise Covered: Horse, Dog, Horse and Rider
Website: www.veterinaryphysio.co.uk
Tel: 07865095005
Email: sally@veterinaryphysio.co.uk

Kate Haynes
Location: Surrey, Sussex, Kent
Expertise Covered: Horse, Performance, Horse and Rider
Tel: 07957 344688
Email: katehaynesphysio@yahoo.co.uk";

if (preg_match_all("~^(.+)\RLocation:[\s\S]*?^Email: (\S*)~m", $input, $matches, PREG_SET_ORDER)) {
    foreach ($matches as $data) {
        echo "{$data[1]}, {$data[2]}\n";
    }
}

输出：

Sally Coulthard, sally@veterinaryphysio.co.uk
Kate Haynes, katehaynesphysio@yahoo.co.uk

Answer 2

您可以通过双重换行拆分内容，然后处理每个块。要获取名字和姓氏，您可以获得不包含": "的最后一行：

$blocks = explode("\n\n", $string);
foreach ($blocks as $block) {
    $lines = explode("\n", $block);
    $mail = end($lines);
    $mail = substr($mail, strlen('Email: '));
    $lines = array_reverse($lines);
    $fnln = '';
    foreach ($lines as $line) {
        if (strpos($line, ': ') === false) {
            $fnln = $line;
            break;
        }
    }
    echo $fnln . ", " . $mail . "<br>";
}

输出：

Sally Coulthard, sally@veterinaryphysio.co.uk
Kate Haynes, katehaynesphysio@yahoo.co.uk

或者，如果电子邮件并不总是块的最后一行;

$blocks = explode("\n\n", $string);
foreach ($blocks as $block) {
    $lines = explode("\n", $block);
    $lines = array_reverse($lines);
    $fnln = '';
    foreach ($lines as $line) {
        if (substr($line, 0, 6) == 'Email:') {
            $mail = substr($line, 7);
        }
        if (strpos($line, ': ') === false) {
            $fnln = $line;
            break;
        }
    }
    echo $fnln . ", " . $mail . "<br>";
}

PHP - 从文件导出名称和电子邮件地址

2 个答案: