我想解析HTML文档并获取所有用户的昵称。
他们采用以下格式:
<a href="/nickname_u_2412477356587950963">Nickname</a>
如何在PHP中使用regular expression来完成?我不能使用DOMElement或简单的HTML解析。
答案 0 :(得分:3)
这是一个不使用正则表达式的工作解决方案:
DomDocument :: loadHTML()遗忘了足以处理格式错误的HTML。
<?php
$doc = new DomDocument;
$doc->loadHTML('<a href="/nickname_u_2412477356587950963">Nickname</a>');
$xpath = new DomXPath($doc);
$nodes = $xpath->query('//a[starts-with(@href, "/nickname")]');
foreach($nodes as $node) {
$username = $node->textContent;
$href = $node->getAttribute('href');
printf("%s => %s\n", $username, $href);
}
答案 1 :(得分:3)
preg_match_all(
'{ # match when
nickname_u_ # there is nickname_u
[\d+]* # followed by any number of digits
"> # followed by quote and closing bracket
(.*)? # capture anything that follows
</a> # until the first </a> sequence
}xm',
'<a href="/nickname_u_2412477356587950963">Nickname</a>',
$matches
);
print_r($matches);
适用于HTML parser以上HTML上使用Regex的常用免责声明。以上可能会改进到更可靠的匹配。 It will work for the example you gave though.