我试图解析一些XML结果集(从通过SOAP API传递的SQL查询生成),其中包含架构信息和数据。
我已设法使用XML
包获取数据,但我很难将架构信息提取到R
环境。
library(magrittr)
library(XML)
## Example XML to parse
file <- '<?xml version="1.0"?>
<rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
<xsd:complexType name="Row">
<xsd:sequence>
<xsd:element name="Column0" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula="Description"/>
<xsd:element name="Column1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="numeric" saw-sql:displayFormula="Number"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
<Row>
<Column0>foo</Column0>
<Column1>1.2</Column1>
</Row>
<Row>
<Column0>bar</Column0>
<Column1>2.3</Column1>
</Row>
</rowset>
'
## Extract the rows
file %>%
XML::xmlParse() %>%
XML::xmlRoot() %>%
XML::xmlElementsByTagName(.,"Row",TRUE) %>%
xmlToDataFrame() -> DF
print(DF)
返回以下内容
Column0 Column1
1 foo 1.2
2 bar 2.3
理想情况下,我想提取包含列信息的第二个数据框,以便我可以使用它来正确格式化我的结果集。然而,我能够得到的最远的是一系列元素。据我所知,这些存储为外部指针,我很难将这些内容拉回到R环境中。
file %>%
XML::xmlParse() %>%
XML::xmlRoot() %>%
XML::xmlElementsByTagName(.,"element",TRUE)
产生
$schema.complexType.sequence.element
<xsd:element name="Column0" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula="Description"/>
$schema.complexType.sequence.element
<xsd:element name="Column1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="numeric" saw-sql:displayFormula="Number"/>
我真正拍摄的内容如下:
name type minOccurs maxOccurs saw.sql.type saw.sql.displayFormula
1 Column0 xsd:string 0 1 varchar Description
2 Column1 xsd:string 0 1 numeric Number
(输出示例生成)
data.frame(name = c("Column0","Column1"),
type = "xsd:string",
minOccurs = "0",
maxOccurs="1",
`saw-sql:type`= c("varchar","numeric"),
`saw-sql:displayFormula` = c("Description","Number"))
对于我在这里失踪的任何内容都会感激不尽!
答案 0 :(得分:1)
get_stuff <- function(y, stuff) { unlist(lapply(y, function(x) x[[stuff]])) }
xml_list <- xmlToList(file)[["schema"]][["complexType"]][["sequence"]]
DF <- data.frame(name = get_stuff(xml_list, "name"),
type = get_stuff(xml_list, "type"),
minOccurs = get_stuff(xml_list, "minOccurs"),
maxOccurs = get_stuff(xml_list, "maxOccurs"),
saw_sql_type = get_stuff(xml_list, "type"),
saw_sql_displayFormula = get_stuff(xml_list, "displayFormula"))
答案 1 :(得分:0)
data.frame(do.call(rbind, xmlToList(file)$schema$complexType$sequence), row.names=NULL)
name type minOccurs maxOccurs type.1 displayFormula
1 Column0 xsd:string 0 1 varchar Description
2 Column1 xsd:string 0 1 numeric Number