如何从php DOMDocument的子节点获取文本

时间:2016-05-12 00:33:05

标签: php web-crawler domdocument

我一直在编写一个PHP代码来从网站获取信息,到目前为止我能够获得href属性,但是我找不到从子节点获取文本的方法" span& #34;,有人能帮助我吗?

html->

reoccurring (TINYINT) 0/1

这就是我目前能够获得href的方式 - >

<a class="js-publication" href="publication/247931167"> 
    <span class="publication-title">An approach for textual authoring</span> 
</a>

1 个答案:

答案 0 :(得分:1)

您可以使用DOMXpath

$html = <<< LOL
<a class="js-publication" href="publication/247931167"> 
    <span class="publication-title">An approach for textual authoring</span> 
</a>
LOL;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//a[@class='js-publication']") as $element){
    echo $element->getAttribute('href');
    echo $element->textContent;
}
//publication/247931167
//An approach for textual authoring

或者没有for循环,如果你只想要一个元素:

echo $xpath->query("//a[@class='js-publication']/span")[0]->textContent;
echo $xpath->query("//a[@class='js-publication']")[0]->getAttribute('href');

Ideone Demo