Question

我在编写正则表达式以匹配此页面上的先前名称时遇到了一些麻烦：http://steamcommunity.com/id/TripleThreat/namehistory

要清楚，我想在数组中得到以下内容：

TripleThreat
[FD] TripleThreat.blyat
9

依旧......

我已经尝试过编写正则表达式，但这是一场灾难（我挣扎的东西）

这是我写的：

$page = file_get_contents(sprintf("http://steamcommunity.com/id/TripleThreat/namehistory"));

preg_match_all("/<span class=\"historyDash\">-<\/span>((.|\n)*)<\/div>/", $page, $matches);

foreach($matches[0] as $match) {
    echo($match . "<br/>");
}

非常感谢任何帮助：）

Answer 1

您可以尝试以下正则表达式（匹配位于第一个捕获组中）：

"/<span class=\"historyDash\">-<\/span>\s*((?:[^\<]|\n)*?)\s*<\/div>/"

在Regex101上查看。

我所做的更改：使用\s*前后修剪空白，将.更改为[^\<]以仅选择不标记的内容（即，正确的文字）。

注意：正如@PedroLobito指出的那样，don't parse HTML with regex unless necessary。尽可能使用a library to parse the DOM。我刚刚提供了一个简单的示例来扩展您的工作，但它可能不是最好的解决方案。

正则表达式匹配以前的名称

1 个答案: