解析R中的xml - 返回数据框对象

时间:2016-01-24 04:20:49

标签: xml r

我成功地将示例1 xml作为R中的数据框对象但是在示例2中遇到了问题。是否有人建议使用R代码将数据从mtcars.xml转换为数据帧?

例1)

library(XML)
# Save the URL of the xml file in a variable

xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml"

# Use the xmlTreePares-function to parse xml file directly from the web

xmlfile <- xmlTreeParse(xml.url)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

例2)

    library(XML)
# Save the URL of the xml file in a variable

doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML"))


xmlfile <- xmlTreeParse(doc)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

mtcarscat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

2 个答案:

答案 0 :(得分:1)

尝试 if KeyCount1 == 1: Start1 = int(round(time.time())) print(Start1) if KeyCount1 == 27: Stop1 = int(round(time.time())) print(Stop1) TotalT1 = Stop1 - Start1 print(TotalT1)

xpathSApply

,并提供:

library(XML)

path <- system.file("exampleData", "mtcars.xml", package="XML")
doc <- xmlTreeParse(path, useInternal = TRUE)
root <- xmlRoot(doc)

read.table(text = xpathSApply(root, "//record", xmlValue), 
           col.names = xpathSApply(root, "//variable", xmlValue))

答案 1 :(得分:1)

这是xml2的一种方式:

library(xml2)
library(purrr)
library(dplyr)

catalog_url <- "http://www.w3schools.com/xml/plant_catalog.xml"
doc <- read_xml(catalog_url)

# get all the "records"
plants <- xml_find_all(doc, ".//PLANT")

# get all the field names
kids <- xml_name(xml_children(plants[1]))

# make a data frame
# - iterate over each record
# - in each record grab each field
# - turn each row into a data frame
# - bind all the data frames together

map_df(plants, function(plant) {
  rbind_list(as.list(setNames(map_chr(kids, function(kid) {
    xml_text(xml_find_one(plant, sprintf(".//%s", kid)))
  }), kids)))
})

## Source: local data frame [36 x 6]
## 
##                 COMMON              BOTANICAL  ZONE        LIGHT PRICE AVAILABILITY
##                  (chr)                  (chr) (chr)        (chr) (chr)        (chr)
## 1            Bloodroot Sanguinaria canadensis     4 Mostly Shady $2.44       031599
## 2            Columbine   Aquilegia canadensis     3 Mostly Shady $9.37       030699
## 3       Marsh Marigold       Caltha palustris     4 Mostly Sunny $6.81       051799
## 4              Cowslip       Caltha palustris     4 Mostly Shady $9.90       030699
## 5  Dutchman's-Breeches    Dicentra cucullaria     3 Mostly Shady $6.44       012099
## 6         Ginger, Wild       Asarum canadense     3 Mostly Shady $9.03       041899
## 7             Hepatica     Hepatica americana     4 Mostly Shady $4.45       012699
## 8            Liverleaf     Hepatica americana     4 Mostly Shady $3.99       010299
## 9   Jack-In-The-Pulpit    Arisaema triphyllum     4 Mostly Shady $3.23       020199
## 10            Mayapple   Podophyllum peltatum     3 Mostly Shady $2.98       060599
## ..                 ...                    ...   ...          ...   ...          ...

通过查找所有可能的子名称(某些“记录”可能有更多或更少的子项),可以使其更加健壮,但这对于此示例就足够了。这样做(按名称获取每个元素的值)可确保它们以正确的顺序返回(元素的顺序不是保证)。