PHP prege_match_all中的正则表达式与regex101.com不同

时间:2017-07-10 15:50:18

标签: php regex

我试图获取数组中的数字。这是我的字符串和我的代码。

$split_times = "return escape('<table class=\'split\' ><tr><td class=\'split0\'>50m</td><td class=\'split1\'>28.86</td><td class=\'split2\'>28.86</td></tr><tr><td class=\'split0\'>100m</td><td class=\'split1\'>1:01.56</td><td class=\'split2\'>32.70</td></tr><tr><td class=\'splitsep\' colspan=\'3\'></td></tr><tr><td class=\'split0\'>150m</td><td class=\'split1\'>1:36.88</td><td class=\'split2\'>35.32</td></tr><tr><td class=\'split0\'>200m</td><td class=\'split1\'>2:59:09.93</td><td class=\'split2\'>33.05</td></tr></table>')";

preg_match_all("/split1\\\'>(\d+(?:\.\d+)?)</", $split_times, $split_times_distances);
print_r($split_times_distances);

它应该返回一个如下数组:

Array
(
    [0] => Array
        (
            [0] => split1\'>28.86<
            [1] => split1\'>1:01.56<
            [2] => split1\'>1:36.88<
            [3] => split1\'>2:59:09.93<
        )

    [1] => Array
        (
            [0] => 28.86
            [1] => 1:01.56
            [2] => 1:36.88
            [3] => 2:59:09.93
        )

)

但相反,它只显示两个数组的第一个索引。

3 个答案:

答案 0 :(得分:1)

您的正则表达式与=\'split1\'>1:36.88<

不匹配

您必须在开头添加(?:\d+:){0,2}

$split_times = "return escape('<table class=\'split\' ><tr><td class=\'split0\'>50m</td><td class=\'split1\'>28.86</td><td class=\'split2\'>28.86</td></tr><tr><td class=\'split0\'>100m</td><td class=\'split1\'>1:01.56</td><td class=\'split2\'>32.70</td></tr><tr><td class=\'splitsep\' colspan=\'3\'></td></tr><tr><td class=\'split0\'>150m</td><td class=\'split1\'>1:36.88</td><td class=\'split2\'>35.32</td></tr><tr><td class=\'split0\'>200m</td><td class=\'split1\'>2:59:09.93</td><td class=\'split2\'>33.05</td></tr></table>')";

preg_match_all("/split1\\\'>((?:\d+:){0,2}\d+(?:\.\d+)?)</", $split_times, $split_times_distances);
//                    here __^^^^^^^^^^^^^
print_r($split_times_distances);

<强>输出:

Array
(
    [0] => Array
        (
            [0] => split1\'>28.86<
            [1] => split1\'>1:01.56<
            [2] => split1\'>1:36.88<
            [3] => split1\'>2:59:09.93<
        )

    [1] => Array
        (
            [0] => 28.86
            [1] => 1:01.56
            [2] => 1:36.88
            [3] => 2:59:09.93
        )

)

答案 1 :(得分:1)

您已使用onMouse... DOMDocument 属性中提取字符串,为什么不继续? 如果不使用专用的Javascript解析器,它很容易提取Javascript字符串,那么您所要做的就是删除转义的引号以获得&#34; raw&#34;字符串:

$onMouseAttr = "return escape('<table class=\'split\' ><tr><td class=\'split0\'>50m</td><td class=\'split1\'>28.86</td><td class=\'split2\'>28.86</td></tr><tr><td class=\'split0\'>100m</td><td class=\'split1\'>1:01.56</td><td class=\'split2\'>32.70</td></tr><tr><td class=\'splitsep\' colspan=\'3\'></td></tr><tr><td class=\'split0\'>150m</td><td class=\'split1\'>1:36.88</td><td class=\'split2\'>35.32</td></tr><tr><td class=\'split0\'>200m</td><td class=\'split1\'>2:59:09.93</td><td class=\'split2\'>33.05</td></tr></table>')";

# first step: extracting the strings

$stringPattern = <<<'EOD'
~ " ( [^"\\]* (?:\\.[^"\\]*)* ) "  |  ' ( [^'\\]* (?:\\.[^'\\]*)* ) ' ~xsS
EOD;

if ( preg_match_all($stringPattern, $onMouseAttr, $matches, PREG_SET_ORDER) ) {

    foreach ($matches as $match) {
        # unescape the string for the correct quote
        $html = isset($match[2]) ? str_replace("\\'", "'", $match[2])
                                 : str_replace('\\"', '"', $match[1]);

        # extract the nodes you want with DOMDocument/DOMXPath
        $dom = new DOMDocument;
        $dom->loadHTML($html);
        $xp = new DOMXPath($dom);
        $nodeList = $xp->query('//td[@class="split1"]');
        foreach ($nodeList as $node) {
            # display them
            echo $node->nodeValue, PHP_EOL;
            # or store them
            # $results[] = $node->nodeValue;
        }
    }
}

答案 2 :(得分:0)

我做的是:只需选择两个字符串之间的所有字符。在这种情况下,基于<td>类名。

print_r(preg_match_all("/split1\\\'>(.*?)</", $split_times, $split_times_distances));

输出:

Array
(
    [0] => Array
        (
            [0] => split1\'>28.86<
            [1] => split1\'>1:01.56<
            [2] => split1\'>1:36.88<
            [3] => split1\'>2:59:09.93<
        )

    [1] => Array
        (
            [0] => 28.86
            [1] => 1:01.56
            [2] => 1:36.88
            [3] => 2:59:09.93
        )

)