Question

是否有人有从Atom兼容数据Feed将数据导入 R 的经验？我已经下载了一个＆＃34; .atomsvc＆＃34;文件并在记事本中打开它的内容并获得以下内容：

<?xml version="1.0" encoding="utf-8" standalone="yes"?><service xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"><workspace><atom:title>OperationallyAvailableCapacity</atom:title><collection href="http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&amp;AssetNbr=51&amp;beg_date=05%2F03%2F2013%2000%3A00%3A00&amp;LocationNbr=%25&amp;LocationProp=%25&amp;LocationName=%25&amp;DirOfLow=%25&amp;rs%3AParameterLanguage=&amp;rs%3ACommand=Render&amp;rs%3AFormat=ATOM&amp;rc%3ADataFeed=xAx0x13"><atom:title>table1</atom:title></collection></workspace></service>

我猜测要导入这个我可能不得不使用RCurl，但由于我对该软件包的经验有限，我希望有人能指出我正确的方向。

任何帮助都将不胜感激。

Answer 1

Feeds只是以XML格式提供信息，可以使用XML包解析。

library(XML)
url <- 'http://housesofstones.com/blog/feed/atom/'

# Download and parse the data
xml_data <- xmlParse(url)

# Convert the xml structure to a list so you can work with it in R
xml_list <- xmlToList(xml_data)

str(head(xml_list))

List of 6
$ title   :List of 2
..$ text  : chr "Houses of Stones"
..$ .attrs: Named chr "text"
.. ..- attr(*, "names")= chr "type"
$ subtitle:List of 2
..$ text  : chr "\"Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house and a c"| __truncated__
..$ .attrs: Named chr "text"
.. ..- attr(*, "names")= chr "type"
$ updated : chr "2013-05-16T12:16:49Z"
$ link    : Named chr [1:3] "alternate" "text/html" "http://housesofstones.com/blog"
..- attr(*, "names")= chr [1:3] "rel" "type" "href"
$ id      : chr "http://housesofstones.com/blog/feed/atom/"
$ link    : Named chr [1:3] "self" "application/atom+xml" "http://housesofstones.com/blog/feed/atom/"
..- attr(*, "names")= chr [1:3] "rel" "type" "href"

或者，使用您的示例数据：

example_data <- '<?xml version="1.0" encoding="utf-8" standalone="yes"?><service xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"><workspace><atom:title>OperationallyAvailableCapacity</atom:title><collection href="http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&amp;AssetNbr=51&amp;beg_date=05%2F03%2F2013%2000%3A00%3A00&amp;LocationNbr=%25&amp;LocationProp=%25&amp;LocationName=%25&amp;DirOfLow=%25&amp;rs%3AParameterLanguage=&amp;rs%3ACommand=Render&amp;rs%3AFormat=ATOM&amp;rc%3ADataFeed=xAx0x13"><atom:title>table1</atom:title></collection></workspace></service>'

xml_data <- xmlParse(example_data)

# Convert the xml structure to a list so you can work with it in R
xml_list <- xmlToList(xml_data)

str(xml_list)

List of 1
$ workspace:List of 2
..$ title     : chr "OperationallyAvailableCapacity"
..$ collection:List of 2
.. ..$ title : chr "table1"
.. ..$ .attrs: Named chr "http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&AssetNbr=51&beg_date=05%2F03%2F2013%2000%3A00%3"| __truncated__
.. .. ..- attr(*, "names")= chr "href"

修改

仔细观察，看起来您的特定示例数据由于某种原因在一个节点中保留大量信息，以URL编码。如果您需要这些数据，则需要将其拉出来。

首先，调用该单个节点，并对URL进行解码，以便更容易解析：

xml_content <- URLdecode(xml_list$workspace$collection$.attrs)

您可以使用“＆amp;”分隔各种参数，以便按字符分割字符串。

xml_content <- unlist(strsplit(xml_content, "&"))

每个新字符串都包含参数名称和值，用等号分隔。有几种方法可以将这些信息分开。也许最简单的方法是使用str_split_fixed包中的plyr函数：

require(stringr) str_split_fixed(xml_content, "=", 2) [,1] [,2] [1,] "http://10.101.111.234/ReportServer?/InfoPost/OperationallyAvailableCapacity" "" [2,] "AssetNbr" "51" [3,] "beg_date" "05/03/2013 00:00:00" [4,] "LocationNbr" "%" [5,] "LocationProp" "%" [6,] "LocationName" "%" [7,] "DirOfLow" "%" [8,] "rs:ParameterLanguage" "" [9,] "rs:Command" "Render" [10,] "rs:Format" "ATOM" [11,] "rc:DataFeed" "xAx0x13"

从符合Atom的数据馈送将数据导入R

1 个答案: