我已经阅读过所有关于R中网页抓取的问题,但无法解决我的问题。我想获取图片的名称(请参阅下面的URL)和每张图片的详细信息。
我意识到必须使用xpathSApply
和循环来获取有关每张图片的信息。但是现在我有问题,即使从http://www.wikiart.org/en/search/monet/11
library(XML)
url = "http://www.wikiart.org/en/search/monet/1#supersized-search-211804"
doc = htmlTreeParse(url, useInternalNodes=T)
pictureName = xpathSApply(doc,"//a[contains(@href, 'title')]",xmlValue)
pictureName
## list()
为什么它会给我list()
?
答案 0 :(得分:2)
试试这个:
pictureNames <- xpathSApply(doc,"//a[@class='big rimage']/@title", unname)
,并提供:
> head(pictureNames)
[1] "Camille and Jean Monet in the Garden at Argenteuil - Claude Monet"
[2] "Camille Monet at the Window, Argentuile - Claude Monet"
[3] "Camille Monet in the Garden - Claude Monet"
[4] "Camille Monet in the Garden at the House in Argenteuil - Claude Monet"
[5] "Camille Monet on a Garden Bench - Claude Monet"
[6] "Camille Monet On Her Deathbed - Claude Monet"