使用r httr从Google搜索中搜索网址

时间:2014-01-02 17:45:31

标签: xml r rcurl httr

我想从Google网络搜索中获取以下网址:

library(httr)
search.term="httr+package+daterange:%3A2456294-2456659"
url.name=paste0("https://www.google.com/search?q=",search.term)
url.get=GET(url.name)
url.content=content(url.get)

然后尝试从结果中获取链接失败:

links <- xpathApply(url.content, "//h3//a[@href]", function(x) xmlAttrs(x)[[1]])
Error in UseMethod("xpathApply") : 
no applicable method for 'xpathApply' applied to an object of class "XMLDocumentContent"

从url.content中获取链接的最佳方法是什么?

1 个答案:

答案 0 :(得分:5)

使用content()尝试as="text",以防止它返回类XMLDocumentContent的对象:

library(httr)
search.term="httr+package+daterange:%3A2456294-2456659"
url.name=paste0("https://www.google.com/search?q=",search.term)
url.get=GET(url.name)
url.content=content(url.get, as="text")
links <- xpathSApply(htmlParse(url.content), "//a/@href")
head(links,3)
# href 
# "https://www.google.com/webhp?tab=ww" 
# href 
# "https://www.google.com/search?q=httr%2Bpackage%2Bdaterange::2456294-2456659&um=1&ie=UTF-8&hl=en&tbm=isch&source=og&sa=N&tab=wi" 
# href 
# "https://maps.google.com/maps?q=httr%2Bpackage%2Bdaterange::2456294-2456659&um=1&ie=UTF-8&hl=en&sa=N&tab=wl" 

<强>更新

杰克在评论中指出,这也有效:

url.get=GET(url.name)
links <- xpathSApply(htmlParse(url.get), "//a/@href")