如果字符串是
<li>Your browser may be missing a required plug-in contained in <a href="http://get.adobe.com/reader/">Adobe Acrobat Reader</a>. Please reload this page after installing the missing component.<br />If this error persists, you can also save a copy of <a href="test.pdf">
我写的正则表达式是
/href=.*?.pdf/
这导致捕获第一个'href'并以'.pdf'结尾。我需要它从第二个href开始。换句话说,它应该只捕获以.pdf
结尾的href我应该如何使用正则表达式来解决这个问题?
答案 0 :(得分:2)
答案 1 :(得分:2)
您应该使用DOM而不是使用正则表达式来解析HTML或XML。在PHP中有DOMDocument
类:
$doc = new DOMDocument();
$doc->loadHTML('<li>Your browser may be missing a required plug-in contained in <a href="http://get.adobe.com/reader/">Adobe Acrobat Reader</a>. Please reload this page after installing the missing component.<br />If this error persists, you can also save a copy of <a href="http://www.police.vt.edu/VTPD_v2.1/crime_stats/crime_logs/data/VT_2011-01_Crime_Log.pdf">');
$links = $doc->getElementsByTagName('a');
foreach($links as $link) {
echo $link->getAttribute('href');
}