R:在R中进行Web抓取时生成xpath时遇到问题

时间:2013-10-19 20:13:42

标签: r xpath

我正在网站废弃以下网站:

   http://www.crowdrise.com/waterforpeople-SE

如果您看一下这个网站,在右侧,在显示Fundraise for this campaign的黑色按钮的正上方,有一条声明说:52% Raised of $20,000 Goal。 我试图提取我刚刚提到的这个声明。

我尝试过xpath表达式:

  .//*[@id="thebody"]/div[6]/div/div/div[2]/div[2]/div[2]/div/p/span

但它不起作用......

什么是正确的xpath表达式?

谢谢,

1 个答案:

答案 0 :(得分:1)

试试这个:

> library(XML)
> doc <- htmlTreeParse('http://www.crowdrise.com/waterforpeople-SE', useInternalNodes = TRUE)
> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]')
[[1]]
<p class="progressText">
  <span>52% Raised of $20,000 Goal</span>
</p> 

attr(,"class")
[1] "XMLNodeSet"

或直接转到文本值:

> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]', xmlValue)
[[1]]
[1] "52% Raised of $20,000 Goal"