Question

我正在网站废弃以下网站：

   http://www.crowdrise.com/waterforpeople-SE

如果您看一下这个网站，在右侧，在显示Fundraise for this campaign的黑色按钮的正上方，有一条声明说：52% Raised of $20,000 Goal。我试图提取我刚刚提到的这个声明。

我尝试过xpath表达式：

  .//*[@id="thebody"]/div[6]/div/div/div[2]/div[2]/div[2]/div/p/span

但它不起作用......

什么是正确的xpath表达式？

谢谢，

Answer 1

试试这个：

> library(XML)
> doc <- htmlTreeParse('http://www.crowdrise.com/waterforpeople-SE', useInternalNodes = TRUE)
> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]')
[[1]]
<p class="progressText">
  <span>52% Raised of $20,000 Goal</span>
</p> 

attr(,"class")
[1] "XMLNodeSet"

或直接转到文本值：

> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]', xmlValue)
[[1]]
[1] "52% Raised of $20,000 Goal"

R：在R中进行Web抓取时生成xpath时遇到问题

1 个答案: