使用preg_match从页面代码中查找链接

时间:2014-09-01 15:00:00

标签: php url hyperlink preg-match

我想使用preg_match更改此内容:

<li class="fte_newsarchivelistleft" style="clear: both; padding-left:0px;"><a class="fte_standardlink fte_edit" href="news,2480143,3-kolejka-sezonu-2014-2015.html">3 kolejka sezonu 2014/2015&nbsp;&raquo;&raquo;</a></li>
                      <li class="fte_newsarchivelistright" style="height: 25px;">komentarzy: <span class="fte_standardlink">[0]</span></li>

对此:

news,2480143,3-kolejka-sezonu-2014-2015.html

我该怎么办?我正在尝试使用preg_match,但该链接太复杂了......

1 个答案:

答案 0 :(得分:0)

使用preg_match确实太复杂了。正如之前在本网站上多次提到的:正则表达式+ HTML不能很好地混合。正则表达式不适合处理标记。然而,DOM解析器是:

$dom = new DOMDocument;//create parser
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
$elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
$hrefs = array();//all href values
foreach ($elemsWithHref as $node)
{
    $hrefs[] = $node->getAttributeNode('href')->value;//assign values
}

在此之后,处理$hrefs中的值很简单,这将是一个字符串数组,每个字符串都是href属性的值。

使用DOM解析器和XPath的另一个例子(向您展示它可以做什么):can be found here

要用href值替换节点,这很简单:

  • 获取父节点
  • 构建文本节点
  • 致电DOMDocument::replaceChild
  • 通过调用save来写入文件,或saveHTMLsaveXML将DOM作为字符串进行结束

一个例子:

$dom = new DOMDocument;//create parser
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
$elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
foreach ($elemsWithHref as $node)
{
    $parent = $node->parentNode;
    $replace = new DOMText($node->getAttributeNode('href')->value);//create text node
    $parent->replaceChild($replace, $node);//replaces $node with $replace textNode
}
$newString = $dom->saveHTML();