PHP DOMXpath没有抓取URL

时间:2012-10-19 00:04:15

标签: php xpath web-scraping domxpath

我正在尝试在这些标记之间获取数据:

<item><title>White Paper and Sales copy writer by stelladonsai</title>
<link>http://www.freelancer.com/projects/Technical-Writing-Academic-Writing/White-Paper-Sales-copy-writer.html</link>
<description>I need native English speaker to write a piece of white paper and a Sales copy.    *you must have experience.   *you must be native English speaker  *you must have sample to be attached with your cover... (Budget: &#36;30-&#36;250 USD, Jobs: Academic Writing, Technical Writing)</description>
<pubDate>Thu, 18 Oct 2012 17:23:10 -0400</pubDate>
<guid isPermaLink="false">Freelancer.com_project_2578329</guid>
</item>

我使用的PHP代码是:

$html = new DOMDocument();
@$html->loadHTML($txt);
$xpath = new DOMXPath( $html );

/*Begin Get Titles*/
$bquery = $xpath->query( "//item/title" );
$i=0;
foreach ($bquery as $b){
    $i=$i+1;
    $title[$i] = $b->nodeValue; 
}
/*End Get Titles*/

/*Begin Get Links*/
$bquery = $xpath->query( "//item/link" );
$i=0;
foreach ($bquery as $b){
    $i=$i+1;
    $link[$i] = $b->nodeValue;  
}
/*End Get Links*/

$ txt来自我用CURL抓取的RSS提要。上面的代码得到的标题很好,以及我想要的任何其他字段。但是当我试图刮擦并回应链接字段时,我什么也得不回来。 (我尝试用其他字段替换// item / link,它们都有效)。

不确定我在这里做错了什么,非常感谢所有的帮助。

1 个答案:

答案 0 :(得分:1)

在加载XML而不是html时使用loadXML而不是loadHTML

http://codepad.org/0FYXiwMy