Question

我想找回span HTML标记之间的数字。这个数字可能会改变！

<span class="topic-count">
  ::before
  "
             24
          "
  ::after
</span>

我尝试过以下代码：

preg_match_all("#<span class=\"topic-count\">(.*?)</span>#", $source, $nombre[$i]);

但它不起作用。

整个代码：

$result=array();
$page = 201;
while ($page>=1) {
    $source = file_get_contents ("http://www.jeuxvideo.com/forums/0-27047-0-1-0-".$page."-0-counter-strike-global-offensive.htm");
    preg_match_all("#<span class=\"topic-count\">(.*?)</span>#", $source, $nombre[$i]);
    $result = array_merge($result, $nombre[$i][1]);
    print("Page : ".$page ."\n");
    $page-=25;
}
print_r ($nombre);

Answer 1

可以用

preg_match_all(
    '#<span class="topic-count">[^\d]*(\d+)[^\d]*?</span>#s', 
    $html, 
    $matches
);

将在跨度结束之前捕获任何数字。

但请注意，此正则表达式仅适用于这段html。如果标记略有变化，例如，另一个类或另一个属性，则该模式将不再起作用。为HTML编写可靠的正则表达式很难。

因此建议改为use a DOM parser，例如

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.jeuxvideo.com/forums/0-27047-0-1-0-1-0-counter-strike-global-offensive.htm');
libxml_use_internal_errors(false);

$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//span[contains(@class, "topic-count")]') as $node) {
    if (preg_match_all('#\d+#s', $node->nodeValue, $topics)) {
        echo $topics[0][0], PHP_EOL;
    }
}

DOM will parse the entire page into a tree of nodes，然后您可以通过XPath方便地查询。注意表达式

//span[contains(@class, "topic-count")]

将为您提供包含字符串topic-count的class属性的所有span元素。然后，如果这些节点中的任何一个包含数字，则回显它。

功能preg_match_all的困难

1 个答案: