Question

我在R中使用XML和RCurl包来从第一页获取数据

    site <- getForm("http://www.google.com/search", hl="en",lr="", q="life of pi", btnG="Search")   #q-> query
    doc<-htmlParse(site, asText=TRUE)
    plain.text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)

我的xpathSApply参数应该是什么，所以我只得到搜索结果的第一行（蓝色的那些更大的字体）

Answer 1

在尝试不是（祖先）之前，可能先从标题或其他标签开始

xpathSApply(doc, "//h3", xmlValue)
 [1] "LIFE OF PI - Buy it on Digital HD, Blu-ray & DVD"
 [2] "Life of Pi - Wikipedia, the free encyclopedia"
 [3] "Life of Pi (film) - Wikipedia, the free encyclopedia"
 [4] "Images for life of pi" 
 [5] "Life of Pi (2012) - IMDb" 
 ...

从谷歌获取第一线搜索结果

1 个答案: