将HTML链接与特定标题匹配

时间:2012-01-09 13:24:49

标签: php html-parsing

如何从具有特定开头标题的HTML链接中检索网址? e。:

<a href="http://urltoretrieve.ext/" title="specific title rest of all title">something</a>
<a href="http://otherurl.ext/" title="a generic title">somethingelse</a>

并使用PHP来检索:

http://urltoretrieve.ext/

谢谢!

1 个答案:

答案 0 :(得分:3)

您可以使用https://gist.github.com/1358174和此XPath

//a[starts-with(@title, "specific title")]/@href

此查询表示:

//a                      find all a elements in the html
[                        that
starts-with(             
    @title               has a title attribute
    'specific-title'     starting with this value
)                        
]                        
/@href                   and return their href attribute

示例(demo):

$result = xpath_match_all(
    '//a[starts-with(@title, "specific title")]/@href', 
    $yourHtmlAsString
);

输出:

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(38) "<href>http://urltoretrieve.ext/</href>"
  }
  [1]=>
  array(1) {
    [0]=>
    string(25) "http://urltoretrieve.ext/"
  }
}

结果是一个数组,其中包含找到的属性节点的序列化innerHTML和outerHTML。如果您不了解节点是什么,请检查DOMDocument in php

另见How do you parse and process HTML/XML in PHP?