使用xpathSApply在R中进行Web抓取

时间:2014-07-01 13:20:24

标签: r web web-scraping

我已经阅读过所有关于R中网页抓取的问题,但无法解决我的问题。我想获取图片的名称(请参阅下面的URL)和每张图片的详细信息。 我意识到必须使用xpathSApply和循环来获取有关每张图片的信息。但是现在我有问题,即使从http://www.wikiart.org/en/search/monet/11

取一个名字
    library(XML)
    url = "http://www.wikiart.org/en/search/monet/1#supersized-search-211804"
    doc = htmlTreeParse(url, useInternalNodes=T)
    pictureName = xpathSApply(doc,"//a[contains(@href, 'title')]",xmlValue)
    pictureName
    ## list()

为什么它会给我list()

1 个答案:

答案 0 :(得分:2)

试试这个:

pictureNames <- xpathSApply(doc,"//a[@class='big rimage']/@title", unname)

,并提供:

> head(pictureNames)
[1] "Camille and Jean Monet in the Garden at Argenteuil - Claude Monet"    
[2] "Camille Monet at the Window, Argentuile - Claude Monet"               
[3] "Camille Monet in the Garden - Claude Monet"                           
[4] "Camille Monet in the Garden at the House in Argenteuil - Claude Monet"
[5] "Camille Monet on a Garden Bench - Claude Monet"                       
[6] "Camille Monet On Her Deathbed - Claude Monet"