匹配格式并从字符串返回令牌

时间:2018-11-11 13:21:14

标签: php regex preg-match

我正在尝试使用PHP中的regex解析大型文本。我知道行格式,下面使用sprintf格式显示,以便于说明。

因此,一行包含一些已知的单词(或括号)。我想知道匹配的格式(在示例中,我打印了formats数组键)并从该行中提取了一些相关数据。

我尝试了诸如'/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'之类的正则表达式格式,但是除了匹配之外,我无法从行中提取正确的数据。

$input = [
    'new message from Bob [22:105:3905:534]',
    'user Dylan posted a question in section General',
    'new message from Mary(gold) [19504:8728:18524:78941]'
];

$formats = [
    'new message from %s [%d:%d:%d:%d]', // this would actually be something like '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'
    'user %s posted a question in section %s',
    'new message from %s(%s) [%d:%d:%d:%d]',
];

foreach ($input as $line) {
    foreach ($formats as $key => $format) {
        $data = [];
        if (preg_match($format, $line, $data)) {
            echo 'format: ' . $key . ', data: ' . var_export($data, true) . "\n";
            continue;
        }
    }
}

// should yield:
// format: 0, data: array ( 0 => 'Bob', 1 => 22, 2 => 105, 3 => 3905, 4 => 534, )
// format: 1, data: array ( 0 => 'Dylan', 1 => 'General', )
// format: 2, data: array ( 0 => 'Mary', 1 => 'gold', 2 => 19504, 3 => 8728, 4 => 18524, 5 => 78941, )

我需要:

  1. 一种有效的正则表达式格式,用于使用多个通配符匹配一行
  2. 当regex格式匹配一行时提取通配符的方法(也许preg_match不是这种情况下最好的regex php函数)

我可以使用字符串函数(strpos和substr)来执行此操作,但是代码看起来很糟糕。

谢谢!

1 个答案:

答案 0 :(得分:0)

只需稍微调整一下图案即可。请参见下面的代码。

<?php

$input = [
    'new message from Bob [22:105:3905:534]',
    'user Dylan posted a question in section General with space',
    'new message from Mary(gold) [19504:8728:18524:78941]'
];

$formats = [
    '/new message from (\w+) \[(\d+):(\d+):(\d+):(\d+)\]/', // this would actually be something like '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'
    '/user (\w+) posted a question in section ([\w ]+)/',
    '/new message from (\w+)\((\w+)\) \[(\d+):(\d+):(\d+):(\d+)\]/',
];

foreach ($input as $line) {
    foreach ($formats as $key => $format) {
        $data = [];
        if (preg_match($format, $line, $data)) {                            
            array_shift($data); 
            echo 'format: ' . $key . ', data: ' . var_export($data, true) . "\n";
            continue;
        }
    }
}

// should yield:
// format: 0, data: array ( 0 => 'Bob', 1 => 22, 2 => 105, 3 => 3905, 4 => 534, )
// format: 1, data: array ( 0 => 'Dylan', 1 => 'General', )
// format: 2, data: array ( 0 => 'Mary', 1 => 'gold', 2 => 19504, 3 => 8728, 4 => 18524, 5 => 78941, )

https://3v4l.org/NBgaT

编辑:我添加了array_shift()来消除与完整模式匹配的文本。