Question

我正在寻找：

<h1> sample string 123.456 - find me </h1>

请注意我感兴趣的h1标签之间的区别。请注意，字符串是一个包含数字，字母和/或字符组合的变量。因此，还需要在h1标签之间使用相同的preg_match_all搜索找到以下内容：

<h1>there are no numbers this time</h1>

或

<h1>this one may be tricky ?!-.</h1>

我现在尝试了以下内容：

preg_match_all("/<h1>[\w\d\D\s]+?<\/h1>$/siU", $input, $matches);
print_r($matches);

脚本运行...但$matches数组在print_r()时不包含任何值。因此它看起来像'Array（[0] =＆gt; Array（））'

Answer 1

使用解析器可能是您的最佳选择。您的问题/意见不清楚，并且与您要识别的内容相矛盾。

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html = '<h1>Hi</h1><h2>test</h2><strong>Test</strong><h1>More</h1>';
$doc->loadHTML($html);
libxml_use_internal_errors(false);
$h1s = $doc->getElementsByTagName('h1');
foreach ($h1s as $h1) {
    echo $h1->nodeValue . "\n";
}

然后，您可以在nodeValue上使用正则表达式来确认该值是否符合预期。

输出：

Hi
More

你最初的问题的正则表达式可能是..

<h1>[a-zA-Z\d]+?<\/h1>

演示：https://regex101.com/r/lD5wQ3/1

Answer 2

下面的

获取所有三个字符串：

<h1>\s?[a-z0-9\s?!.]*<\/h1>

Answer 3

问题是，预期的结果是什么？你可以试试这个：

.jar

结果：

$input = '<h1> Alphanumeric value here </h1>';
preg_match_all("/^<h1>(.*)<\/h1>/su", $input, $matches);
print_r($matches);

Answer 4

preg_match_all("%^<h1>[a-zA-Z0-9\s]*</h1>$%siU", $input, $matches);

这将返回<h1>个标签内的文字，因此如果您想要包含标签，只需执行

即可

"<h1>".$result."</h1>"

使用preg_match_all（）努力匹配字符串

4 个答案: