将架构信息从XML提取到data.frame

时间:2018-01-29 21:52:10

标签: r xml

我试图解析一些XML结果集(从通过SOAP API传递的SQL查询生成),其中包含架构信息和数据。

我已设法使用XML包获取数据,但我很难将架构信息提取到R环境。

XML和行提取示例

library(magrittr)
library(XML)

## Example XML to parse
file <- '<?xml version="1.0"?>
<rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
    <xsd:complexType name="Row">
      <xsd:sequence>
        <xsd:element name="Column0" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula="Description"/>
        <xsd:element name="Column1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="numeric" saw-sql:displayFormula="Number"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:schema>
  <Row>
    <Column0>foo</Column0>
    <Column1>1.2</Column1>
  </Row>
  <Row>
    <Column0>bar</Column0>
    <Column1>2.3</Column1>
  </Row>
</rowset>
'
## Extract the rows    
file %>% 
  XML::xmlParse() %>% 
  XML::xmlRoot() %>%
  XML::xmlElementsByTagName(.,"Row",TRUE) %>% 
  xmlToDataFrame() -> DF

print(DF)

返回以下内容

  Column0 Column1
1     foo     1.2
2     bar     2.3

尝试图式提取

理想情况下,我想提取包含列信息的第二个数据框,以便我可以使用它来正确格式化我的结果集。然而,我能够得到的最远的是一系列元素。据我所知,这些存储为外部指针,我很难将这些内容拉回到R环境中。

file %>% 
  XML::xmlParse() %>% 
  XML::xmlRoot() %>%
  XML::xmlElementsByTagName(.,"element",TRUE) 

产生

$schema.complexType.sequence.element
<xsd:element name="Column0" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula="Description"/> 

$schema.complexType.sequence.element
<xsd:element name="Column1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="numeric" saw-sql:displayFormula="Number"/> 

期望输出

我真正拍摄的内容如下:

     name       type minOccurs maxOccurs saw.sql.type saw.sql.displayFormula
1 Column0 xsd:string         0         1      varchar            Description
2 Column1 xsd:string         0         1      numeric                 Number

(输出示例生成)

data.frame(name = c("Column0","Column1"),
           type = "xsd:string",
           minOccurs = "0",
           maxOccurs="1",
           `saw-sql:type`= c("varchar","numeric"),
           `saw-sql:displayFormula` = c("Description","Number"))

对于我在这里失踪的任何内容都会感激不尽!

2 个答案:

答案 0 :(得分:1)

get_stuff <- function(y, stuff) { unlist(lapply(y, function(x) x[[stuff]])) }

xml_list <- xmlToList(file)[["schema"]][["complexType"]][["sequence"]]

DF <- data.frame(name = get_stuff(xml_list, "name"),
                 type = get_stuff(xml_list, "type"),
                 minOccurs = get_stuff(xml_list, "minOccurs"),
                 maxOccurs = get_stuff(xml_list, "maxOccurs"),
                 saw_sql_type = get_stuff(xml_list, "type"),
                 saw_sql_displayFormula = get_stuff(xml_list, "displayFormula"))

答案 1 :(得分:0)

data.frame(do.call(rbind, xmlToList(file)$schema$complexType$sequence), row.names=NULL)
     name       type minOccurs maxOccurs  type.1 displayFormula
1 Column0 xsd:string         0         1 varchar    Description
2 Column1 xsd:string         0         1 numeric         Number