Question

我想从html文档中的以下元标记中找到一个模型编号

<meta name="description" content="Model AB-1234. Model description here" />

我想仅匹配型号（AB-1234）。我已经尝试了几件事，我将在下面列出两件事：

preg_match('/<meta name="description" content="\bmodel\b(.*)"/i', $html, $model);

这个返回AB-1234. Model description here

=============================================== ===================================

preg_match('/<meta name="description" content="(.*)"/i', $html, $model);

这个回复：Model AB-1234. Model description here

可能有一种方法是在.（点）处停下来，但我不知道如何处理。

谢谢，

Answer 1

您可以使用：

preg_match('/<meta name="description" content="model\s++\K[^.]++/i',
           $html, $model);
print_r($model);

说明：

/<meta name="description" content="model
\s++    # one or more spaces, tabs, newlines (possessive)
\K      # reset the pattern begining
[^.]++  # all that is not a dot one or more times (possessive)

有关possessive quantifiers

的更多信息

请注意，使用DOM提取属性内容然后使用正则表达式来查找模型更安全。例如：

$html = <<<LOD
<meta name="description" content="Model AB-1234. Model description here" />
LOD;

$doc=new DOMDocument();
$doc->loadHTML($html);
$content=$doc->getElementsByTagName('meta')->item(0)->getAttribute('content');

preg_match('/model\s++\K[^.]++/i', $content, $model);

Answer 2

preg_match('/<meta name="description" content="model\s+([^.]*)"/i', $html, $model);

一般来说，最好不要使用regexp来解析HTML，因为你对确切的布局非常敏感。更好的是使用DOM解析库。提取content属性，然后您可以使用正则表达式提取其中的部分。

Answer 3

$str = '<meta name="description" content="Model AA-1234. Model description here" />

<meta name="description" content="Model AB-1234. Model description here" />

<meta name="description" content="Model AC-1234. Model description here" />

<meta name="description" content="Model AD-1234. Model description here" />
';

preg_match_all('/content="Model (.*?)\./is', $str, $data);
if(!empty($data[1])){
$models = $data[1];
print_r($models);
}

//结果

Array ( [0] => AA-1234 [1] => AB-1234 [2] => AC-1234 [3] => AD-1234 )

如何使用preg_match匹配字符串中的以下内容

3 个答案: