提取深层XML结构

时间:2017-03-21 15:20:48

标签: r xml

我有以下要使用R解析的XML文件.XML具有深层结构,并且还有不同数量的子节点。

<?xml version="1.0" encoding="UTF-8"?>

<Alert date="20161223_2" type="full">
<Records>
<Person Id="100">
  <PersonNameDetails>
    <PersonNames id="Name1">
      <ReferenceGroup ReferenceGroupCode="ABC"/>
      <ReferenceGroup ReferenceGroupCode="DEF"/>
      <PersonNameValue>
        <FirstName>Carl Bangouvounda</FirstName>
        <Surname>Toziz</Surname>
      </PersonNameValue>
    </PersonNames>
    <PersonNames id="Name2">
      <ReferenceGroup ReferenceGroupCode="ABC"/>
      <ReferenceGroup ReferenceGroupCode="GHI" ReferenceGroupLanguageCode="en"/>
      <ReferenceGroup ReferenceGroupCode="JKL"/>
      <ReferenceGroup ReferenceGroupCode="MNO"/>
      <ReferenceGroup ReferenceGroupCode="DEF"/>
      <PersonNameValue>
        <FirstName>Tozize</FirstName>
        <Surname>Bangouvonda</Surname>
      </PersonNameValue>
    </PersonNames>
    <PersonNames id="Name3">
      <ReferenceGroup ReferenceGroupCode="MNO"/>
      <PersonNameValue>
        <FirstName>Carol</FirstName>
        <Surname>Tozize</Surname>
      </PersonNameValue>
    </PersonNames>
    <PersonNames id="Name4">
      <ReferenceGroup ReferenceGroupCode="PQR"/>
      <ReferenceGroup ReferenceGroupCode="MNO"/>
      <PersonNameValue>
        <FirstName>Carol</FirstName>
        <MiddleName>Bangouvonda</MiddleName>
        <Surname>Tozize</Surname>
      </PersonNameValue>
    </PersonNames>
    <PersonNames id="Name5">
      <ReferenceGroup ReferenceGroupCode="GHI" ReferenceGroupLanguageCode="en"/>
      <ReferenceGroup ReferenceGroupCode="JKL"/>
      <ReferenceGroup ReferenceGroupCode="DEF"/>
      <PersonNameValue>
        <FirstName>Carl Bangouvonda</FirstName>
        <Surname>Toziz</Surname>
      </PersonNameValue>
    </PersonNames>
  </PersonNameDetails>
</Person>
</Records>
</Alert>

预期输出如下:

-----------------------------------------------------------
Id | id | ReferenceGroup | FirstName | MiddleName | Surname
-----------------------------------------------------------
100 | Name1 | ABC, DEF | Carl Bangouvounda | NA | Toziz 
-----------------------------------------------------------
100 | Name2 | ABC, GHI, JKL, MNO, DEF | Tozize | NA | Bangouvonda
-----------------------------------------------------------
100 | Name3 | MNO | Carol | NA | Tozize
-----------------------------------------------------------
100 | Name4 | PQR, MNO | Carol | Bangouvonda | Tozize
-----------------------------------------------------------
100 | Name5 | GHI, JKL, DEF | Carl Bangouvonda | NA | Toziz
-----------------------------------------------------------

Id来自元素Person的属性,而所有其他属性来自PersonNameDetails。我还想将ReferenceGroupCode连接到同一Personnames元素中的一个字符串中。

我按照以下代码将建议转换为XSLT:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>

  <xsl:template match="/Alert ">
    <xsl:copy>
      <xsl:apply-templates select="Records"/>
    </xsl:copy>
  </xsl:template>  

  <xsl:template match="Records">    
    <xsl:apply-templates select="Person"/>    
  </xsl:template>

  <xsl:template match="Person">    
    <xsl:apply-templates select="PersonNameDetails"/>    
  </xsl:template> 

  <xsl:template match="PersonNameDetails">    
    <xsl:apply-templates select="PersonNames"/>    
  </xsl:template>  

  <xsl:template match="PersonNames">    
    <xsl:apply-templates select="PersonNameValue"/>    
  </xsl:template> 

  <xsl:template match="PersonNameValue">
    <PersonNameValue>
      <Id><xsl:value-of select="ancestor::Person/@Id"/></Id>
      <id><xsl:value-of select="ancestor::PersonNames/@id"/></id>
      <xsl:copy-of select="FirstName"/>
      <MiddleName><xsl:value-of select="MiddleName"/></MiddleName>
      <Surname><xsl:value-of select="Surname"/></Surname>
      <ReferenceGroupCode><xsl:value-of select="ancestor::PersonNames/ReferenceGroup/@ReferenceGroupCode"/></ReferenceGroupCode>
    </PersonNameValue>
  </xsl:template>

</xsl:transform>

如何更改XSLT代码,以便ReferenceGroup输出

<ReferenceGroupCode>ABC,DEF</ReferenceGroupCode>

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

不确定XSLT,但您可以在PersonNames节点上使用xpath并编写一个函数来处理缺失值或多个值。

doc <- xmlParse( "<your XML file>")
x <- getNodeSet(doc, "//PersonNames")   
xpath2 <-function(x, ...){
    y <- xpathSApply(x, ...)
    ifelse(length(y) == 0, NA,  paste(y, collapse=", "))
}
y <- data.frame(
  id =            sapply(x, xpath2, ".", xmlGetAttr, "id"),
  ReferenceGroup= sapply(x, xpath2, ".//ReferenceGroup", xmlGetAttr, "ReferenceGroupCode"),
  FirstName =     sapply(x, xpath2, ".//FirstName", xmlValue),
  MiddleName =    sapply(x, xpath2, ".//MiddleName", xmlValue),
  Surname =       sapply(x, xpath2, ".//Surname", xmlValue)
)  
     id          ReferenceGroup         FirstName  MiddleName     Surname
1 Name1                ABC, DEF Carl Bangouvounda        <NA>       Toziz
2 Name2 ABC, GHI, JKL, MNO, DEF            Tozize        <NA> Bangouvonda
3 Name3                     MNO             Carol        <NA>      Tozize
4 Name4                PQR, MNO             Carol Bangouvonda      Tozize
5 Name5           GHI, JKL, DEF  Carl Bangouvonda        <NA>       Toziz

也许可以通过计算PersonName节点的数量来添加Person Id?

n <- xpathSApply(doc, "//Person/PersonNameDetails", xmlSize)
y$ID <- rep( xpathSApply(doc, "//Person", xmlGetAttr, "Id"), n)