我在R中使用XML和RCurl包来从第一页获取数据
site <- getForm("http://www.google.com/search", hl="en",lr="", q="life of pi", btnG="Search") #q-> query
doc<-htmlParse(site, asText=TRUE)
plain.text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)
我的xpathSApply参数应该是什么,所以我只得到搜索结果的第一行(蓝色的那些更大的字体)
答案 0 :(得分:0)
在尝试不是(祖先)之前,可能先从标题或其他标签开始
xpathSApply(doc, "//h3", xmlValue)
[1] "LIFE OF PI - Buy it on Digital HD, Blu-ray & DVD"
[2] "Life of Pi - Wikipedia, the free encyclopedia"
[3] "Life of Pi (film) - Wikipedia, the free encyclopedia"
[4] "Images for life of pi"
[5] "Life of Pi (2012) - IMDb"
...