Question

我试图分离并从包含2个类似HTML语句的字符串中获取数字：

1 - <td class="center"><p class="texte">1914</p></td>

2 - <td class="center"><p class="texte">135.000</p></td>

所以，我正在寻找数字135.000，而不是数字1914.

重要提示：这不是美国的数字表示法。 135.000实际上是十三万五千。

我尝试了类似([1-9][0-9]{1,2})之类的内容，但这会从上面的语句1中捕获191，这不是预期的。

由于

Answer 1

你正在处理html，你需要先使用html解析器（XPATH是你的朋友）。然后，您需要使用preg_match函数来过滤所需格式的数字。例如：

$dom = new DOMDocument;
$dom->loadHTML($yourHtmlString);

$xp = new DOMXPath($dom);

// you need to register the function `preg_match` to use it in your xpath query
$xp->registerNamespace("php", "http://php.net/xpath");
$xp->registerPhpFunctions('preg_match');

// The xpath query
$targetNodeList = $xp->query('//td[@class="center"]/p[@class="texte"][php:functionString("preg_match", "~^[1-9][0-9]{0,2}(?:\.[0-9]{3})*$~", .) > 0]');
#                             ^                                     ^^                                                                             ^
#                             '------------------+------------------''-----------------------------------+-----------------------------------------'
#                                                '- describe the path in the DOM tree                    |
#                                                                                                        '- predicate to check the content format  

foreach ($targetNodeList as $node) {
    echo $node->nodeValue, PHP_EOL;
}

Answer 2

试一试：）

\s*[\d.]+(?=<)

这是链接： Regex Example

regexp php - 否定4位数（年）数字

2 个答案: