我成功地将示例1 xml作为R中的数据框对象但是在示例2中遇到了问题。是否有人建议使用R代码将数据从mtcars.xml转换为数据帧?
例1)
library(XML)
# Save the URL of the xml file in a variable
xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml"
# Use the xmlTreePares-function to parse xml file directly from the web
xmlfile <- xmlTreeParse(xml.url)
# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]
# To extract the XML-values from the document, use xmlSApply:
plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
例2)
library(XML)
# Save the URL of the xml file in a variable
doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML"))
xmlfile <- xmlTreeParse(doc)
# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]
# To extract the XML-values from the document, use xmlSApply:
mtcarscat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
答案 0 :(得分:1)
尝试 if KeyCount1 == 1:
Start1 = int(round(time.time()))
print(Start1)
if KeyCount1 == 27:
Stop1 = int(round(time.time()))
print(Stop1)
TotalT1 = Stop1 - Start1
print(TotalT1)
:
xpathSApply
,并提供:
library(XML)
path <- system.file("exampleData", "mtcars.xml", package="XML")
doc <- xmlTreeParse(path, useInternal = TRUE)
root <- xmlRoot(doc)
read.table(text = xpathSApply(root, "//record", xmlValue),
col.names = xpathSApply(root, "//variable", xmlValue))
答案 1 :(得分:1)
这是xml2
的一种方式:
library(xml2)
library(purrr)
library(dplyr)
catalog_url <- "http://www.w3schools.com/xml/plant_catalog.xml"
doc <- read_xml(catalog_url)
# get all the "records"
plants <- xml_find_all(doc, ".//PLANT")
# get all the field names
kids <- xml_name(xml_children(plants[1]))
# make a data frame
# - iterate over each record
# - in each record grab each field
# - turn each row into a data frame
# - bind all the data frames together
map_df(plants, function(plant) {
rbind_list(as.list(setNames(map_chr(kids, function(kid) {
xml_text(xml_find_one(plant, sprintf(".//%s", kid)))
}), kids)))
})
## Source: local data frame [36 x 6]
##
## COMMON BOTANICAL ZONE LIGHT PRICE AVAILABILITY
## (chr) (chr) (chr) (chr) (chr) (chr)
## 1 Bloodroot Sanguinaria canadensis 4 Mostly Shady $2.44 031599
## 2 Columbine Aquilegia canadensis 3 Mostly Shady $9.37 030699
## 3 Marsh Marigold Caltha palustris 4 Mostly Sunny $6.81 051799
## 4 Cowslip Caltha palustris 4 Mostly Shady $9.90 030699
## 5 Dutchman's-Breeches Dicentra cucullaria 3 Mostly Shady $6.44 012099
## 6 Ginger, Wild Asarum canadense 3 Mostly Shady $9.03 041899
## 7 Hepatica Hepatica americana 4 Mostly Shady $4.45 012699
## 8 Liverleaf Hepatica americana 4 Mostly Shady $3.99 010299
## 9 Jack-In-The-Pulpit Arisaema triphyllum 4 Mostly Shady $3.23 020199
## 10 Mayapple Podophyllum peltatum 3 Mostly Shady $2.98 060599
## .. ... ... ... ... ... ...
通过查找所有可能的子名称(某些“记录”可能有更多或更少的子项),可以使其更加健壮,但这对于此示例就足够了。这样做(按名称获取每个元素的值)可确保它们以正确的顺序返回(元素的顺序不是保证)。