R List with List,在申请前检查xPath

时间:2014-08-28 16:53:11

标签: r xpath web-scraping

继续我之前提出的问题“How to Check if XPath Exists”,我遇到了一个令我难过的奇怪怪癖。

鉴于以下代码,为什么meta [2]的测试工作正常,但在测试meta [3]时,它总是返回一个空项。

任何人都可以解释为什么/解决这个问题?干杯

require(XML)
require(RCurl)

urls      <- list("http://www.coindesk.com/information")
for (i in seq_along(urls)) 
{
  parsed  <- htmlParse(urls[i])
  meta    <- list()
  meta[1] <- urls[i]
  meta[2] <- if(length(xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content"))==0) 
             {  
               "Desc NA" 
             } 
             else 
             {
               xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content")
             }    
  meta[3]  <- if(length(paste(xpathApply(parsed, "//meta[starts-with(@property, \"article:tag\")]", xmlGetAttr,"content"), collapse = ','))==0) 
             {
               "Tags NA"
             } 
             else 
             {
               paste(xpathApply(parsed, "//meta[starts-with(@property, \"article:tag\")]", xmlGetAttr,"content"), collapse = ',')   
             }
}
print(meta)

[[1]]
[1] "http://www.coindesk.com/information"

[[2]]
[1] "Desc NA"

[[3]]
[1] ""

1 个答案:

答案 0 :(得分:1)

这是因为您要添加paste()。请注意,当xpathApply找不到任何内容时,它会返回list(),其长度为0.但是当您在粘贴中使用它时

paste(list(), collapse=",")
# [1] ""

它实际上返回一个长度为1且带有空字符串的向量。最好从paste()

中取出if
 if(length(xpathApply(parsed, "//meta[starts-with(@property, \"article:tag\")]", xmlGetAttr,"content"))==0)