Question

我尝试编写一些代码来返回xml Feed中给定元素的值。以下代码适用于除uk_legislation_feed之外的所有Feed。有人可以给我一个暗示，为什么会这样，以及如何解决问题？感谢。

library(XML)

uk_legislation_feed <- c("http://www.legislation.gov.uk/new/data.feed", "xml", "//title")
test_feed <- c("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml", "xml", "//zipcode")
ons_feed <- c("https://www.ons.gov.uk/releasecalendar?rss", "xml", "//title")

read_data <- function(feed) {
  if (feed[2] == "xml") {
    if (!file.exists(feed[1])) download.file(feed[1], "tmp.xml", "curl")
    dat <- xmlRoot(xmlTreeParse("tmp.xml", useInternalNodes = TRUE))
  }
  titles <- xpathSApply(dat, feed[3], xmlValue)

  return(titles)
}

Answer 1

由于uk_legislation_feed中未声明的命名空间（特别是没有xmlns前缀）http://www.w3.org/2005/Atom，节点未正确映射。因此，您需要在URI处声明一个名称空间并在XPath表达式中使用它：

url <- "http://www.legislation.gov.uk/new/data.feed"
webpage <- readLines(url)

file <- xmlParse(webpage)
nmsp <- c(ns="http://www.w3.org/2005/Atom")

titles <- xpathSApply(file, "//ns:title", xmlValue,
                      namespaces = nmsp)
titles

# [1] "Search Results"  

# [2] "The Air Navigation (Restriction of Flying) (RNAS Culdrose) (Amendment) \
#      Regulations 2016"
...

xpathSApply未找到所需的节点

1 个答案: