Question

我试图从html页面使用preg_match获取数值（即105），请检查我的html代码......

<p>
                External Backlinks
            </p>
            <p style="font-size: 150%;">
                <b>105</b>
            </p>

我使用了以下正则表达式...

$url = 'http://www.example.com/test.html';

preg_match('#<p>External Backlinks</p><p style="font-size: 150%;"><b>([0-9\.]+)#', file_get_contents($url), $matches);

echo $matches[1];

但是它没有返回正确的值，请帮助修复上面的正则表达式。谢谢。

Answer 1

我不建议使用正则表达式来解析HTML。请改用DOM parser。 Read this rant for more information about why :)

回答你的问题。这是一个适用于你的例子的正则函数：

<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>

它很丑，但它有效...... 不要使用它。

preg_match('#<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>#', file_get_contents($url), $matches);

echo $matches[1];

<强>输出：

你的正则表达式的问题在于它没有考虑HTML源代码中的空格，并且你没有逃避斜线。

如果来源看起来像这样：

<p>External Backlinks</p><p style="font-size: 150%;"><b>105</b></p>

你们本来会工作，但不是很健壮。（我想有人可能认为使用正则表达式解析HTML从来都不是很强大。）

preg_match问题

1 个答案: