XML树的通用转换为平面文本行

时间:2012-06-29 22:50:54

标签: c# xml xslt

This question's解决方案将示例XML树的硬编码转换为平面分隔文本文件:

string orderXml = 
@"<?xml version='1.0' encoding='utf-8'?>
<Order id='79223510'>
<Status>new</Status>
<ShipMethod>Standard International</ShipMethod>
<ToCity>Tokyo</ToCity>
<Items>
    <Item>
    <SKU>SKU-1234567890</SKU>
    <Quantity>1</Quantity>
    <Price>99.95</Price>
    </Item>
    <Item>
    <SKU>SKU-1234567899</SKU>
    <Quantity>1</Quantity>
    <Price>199.95</Price>
    </Item>
</Items>
</Order>";

StringReader str = new StringReader(orderXml);

var xslt = new XmlTextReader(new StringReader(

    @"<xsl:stylesheet version='1.0' 
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>

    <xsl:output method='text' indent='no' media-type='text/plain' />
    <xsl:variable name='newline'><xsl:text>&#13;&#10;</xsl:text></xsl:variable>
    <xsl:variable name='delimiter'>|</xsl:variable>

    <!-- by default, don't copy any nodes to output -->
    <xsl:template match='node()|@*'>
        <xsl:apply-templates select='node()|@*'/>
    </xsl:template>

    <xsl:template match='/Order/Items/Item'>
    <xsl:value-of
        select='concat(
        ../../@id, $delimiter,
        ../../Status, $delimiter,
        ../../ShipMethod, $delimiter,
        ../../ToCity, $delimiter,
        SKU, $delimiter,
        Quantity, $delimiter,
        Price,
        $newline)'
        />
    </xsl:template>
    </xsl:stylesheet>"

                ));

var xDoc = new XPathDocument(str);
var xTr = new System.Xml.Xsl.XslCompiledTransform();
xTr.Load(xslt);

StringBuilder sb = new StringBuilder();
StringWriter writer = new StringWriter(sb);
xTr.Transform(xDoc, null, writer);

string[] lines = sb.ToString().Split(new string[] {"\n"}, StringSplitOptions.RemoveEmptyEntries);    

lines.ToList().ForEach(System.Console.Write);     

按以下方式生成输出:

79223510|new|Standard International|Tokyo|SKU-1234567890|1|99.95
79223510|new|Standard International|Tokyo|SKU-1234567899|1|199.95

是否有办法使用通用XSL转换遍历源XML树并将父节点和属性值连接到子节点来生成相同的输出?

注意:

  1. 如果节点有任何属性,那么该节点将没有连接值。

  2. 如果节点有多个属性,则应使用斜杠字符连接它们的值。

  3. 真正的通用解决方案应该可以处理并压缩具有两个以上层次结构级别的XML树。

  4. 这是另一个示例文档,其中包含具有两个属性的附加父节点:

    <Order id='79223510'>
    <Status>new</Status>
    <ShipMethod>Standard International</ShipMethod>
    <ToCity>Tokyo</ToCity>
    <Marketplace id="123-45678-9089808" name="MyBooks" />
    <Items>
        <Item>
        <SKU>SKU-1234567890</SKU>
        <Quantity>1</Quantity>
        <Price>99.95</Price>
        </Item>
        <Item>
        <SKU>SKU-1234567899</SKU>
        <Quantity>1</Quantity>
        <Price>199.95</Price>
        </Item>
    </Items>
    </Order>
    

    这是一个所需的分隔扁平文本文件输出:

    79223510|new|Standard International|Tokyo|123-45678-908980/MyBooks|SKU-1234567890|1|99.95
    79223510|new|Standard International|Tokyo|123-45678-908980/MyBooks|SKU-1234567899|1|199.95
    

    Dimitre Novatchev解决方案适用于原始样本文档和具有更高级别节点层次结构的XML文档。

        string orderXml = 
    //      @"<?xml version='1.0' encoding='utf-8'?>
    //      <Order id='79223510'>
    //      <Status>new</Status>
    //      <ShipMethod>Standard International</ShipMethod>
    //      <ToCity>Tokyo</ToCity>
    //      <Marketplace id='123-45678-9089808' name='MyBooks'/>
    //      <Items>
    //          <Item>
    //          <SKU>SKU-1234567890</SKU>
    //          <Quantity>1</Quantity>
    //          <Price>99.95</Price>
    //          </Item>
    //          <Item>
    //          <SKU>SKU-1234567899</SKU>
    //          <Quantity>1</Quantity>
    //          <Price>199.95</Price>
    //          </Item>
    //      </Items>
    //      </Order>";
    
    @"<?xml version='1.0' encoding='utf-8'?>
    <Order id='79223510'>
        <Status>new</Status>
        <ShipMethod>Standard International</ShipMethod>
        <ToCity>Tokyo</ToCity>
        <Marketplace id=""123-45678-9089808"" name=""MyBooks"" />
        <Items>
            <Item>
            <X>
                <SKU>SKU-1234567890</SKU>
                <Quantity>1</Quantity>
                <Price>99.95</Price>
            </X>
            <X>
                <SKU>SKU-1234554321</SKU>
                <Quantity>1</Quantity>
                <Price>199.95</Price>
            </X>
            </Item>
            <Item>
            <Y>
                <SKU>SKU-0987654321</SKU>
                <Quantity>1</Quantity>
                <Price>299.95</Price>
            </Y>
            <Y>
                <SKU>SKU-0987667890</SKU>
                <Quantity>1</Quantity>
                <Price>399.95</Price>
            </Y>
            </Item>
        </Items>
    </Order>";
    
    StringReader str = new StringReader(orderXml);
    
    var xslt = new XmlTextReader(new StringReader(   
        @"<xsl:stylesheet version='1.0'
            xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
            xmlns:ext='http://exslt.org/common'>
        <xsl:output method='text'/>
        <xsl:strip-space elements='*'/>
    
            <xsl:param name='pLeafNodes' select=
            '//*[not(*[*])
                and
                (
                name() = name(following-sibling::*[1])
                or
                name() = name(preceding-sibling::*[1])
                )
                ]'/>
    
            <xsl:template match='/'>
            <xsl:variable name='vrtfPass1'>
                <t>
                    <xsl:call-template name='StructRepro'/>
                </t>
            </xsl:variable>
    
            <xsl:apply-templates mode='pass2'
                select='ext:node-set($vrtfPass1)/*/*' />
            </xsl:template>
    
        <xsl:template match='Order' mode='pass2'>
        <xsl:apply-templates select='.//@* | .//text()' mode='pass2'/>
        <xsl:text>&#xA;</xsl:text>
        </xsl:template>
    
        <xsl:template match='@*|text()' mode='pass2'>
        <xsl:if test='not(position()=1) and not(self::text())'>/</xsl:if>
        <xsl:if test='not(position()=1) and self::text()'>|</xsl:if>
        <xsl:value-of select='.'/>
        </xsl:template>
    
            <xsl:template name='StructRepro'>
            <xsl:param name='pLeaves' select='$pLeafNodes'/>
    
            <xsl:for-each select='$pLeaves'>
                <xsl:apply-templates mode='build' select='/*'>
                <xsl:with-param name='pChild' select='.'/>
                <xsl:with-param name='pLeaves' select='$pLeaves'/>
                </xsl:apply-templates>
            </xsl:for-each>
            </xsl:template>
    
            <xsl:template mode='build' match='node()|@*'>
                <xsl:param name='pChild'/>
                <xsl:param name='pLeaves'/>
    
                <xsl:copy>
                <xsl:apply-templates mode='build' select='@*'/>
    
                <xsl:variable name='vLeafChild' select=
                    '*[count(.|$pChild) = count($pChild)]'/>
    
                <xsl:choose>
                    <xsl:when test='$vLeafChild'>
                    <xsl:apply-templates mode='build'
                        select='$vLeafChild
                                |
                                node()[not(count(.|$pLeaves) = count($pLeaves))]'>
                        <xsl:with-param name='pChild' select='$pChild'/>
                        <xsl:with-param name='pLeaves' select='$pLeaves'/>
                    </xsl:apply-templates>
                    </xsl:when>
                    <xsl:otherwise>
                    <xsl:apply-templates mode='build' select=
                    'node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                            or
                            .//*[count(.|$pChild) = count($pChild)]
                            ]
                    '>
    
                        <xsl:with-param name='pChild' select='$pChild'/>
                        <xsl:with-param name='pLeaves' select='$pLeaves'/>
                    </xsl:apply-templates>
                            </xsl:otherwise>
                        </xsl:choose>
                        </xsl:copy>
                    </xsl:template>
                    <xsl:template match='text()'/>
                </xsl:stylesheet>"
                    ));
    
    
    //
    // White space cannot be stripped from input documents that have already been loaded. 
    // Provide the input document as an XmlReader instead. 
    //+
    //var xDoc = new XPathDocument(str);
    XmlReaderSettings settings;
    settings = new XmlReaderSettings();
    settings.ConformanceLevel = ConformanceLevel.Document;
    var xDoc = XmlReader.Create(str, settings);
    //-
    
    var xTr = new System.Xml.Xsl.XslCompiledTransform();
    xTr.Load(xslt);
    
    StringBuilder sb = new StringBuilder();
    StringWriter writer = new StringWriter(sb);
    xTr.Transform(xDoc, null, writer);
    
    string[] lines = sb.ToString().Split(new string[] {"\n"}, StringSplitOptions.RemoveEmptyEntries);    
    
    lines.ToList().ForEach(System.Console.Write);     
    
    //  test output 1
    //  79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234567890|1|99.95
    //  79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234567899|1|199.95
    
    // test output 2
    //  79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234567890|1|99.95
    //  79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234554321|1|199.95
    //  79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-0987654321|1|299.95
    //  79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-0987667890|1|399.95 
    

1 个答案:

答案 0 :(得分:0)

此转化

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common">
     <xsl:output method="text"/>
     <xsl:strip-space elements="*"/>

     <xsl:param name="pLeafNodes" select=
     "//*[not(*[*])
        and
         (
          name() = name(following-sibling::*[1])
         or
          name() = name(preceding-sibling::*[1])
          )
         ]"/>

     <xsl:template match="/">
      <xsl:variable name="vrtfPass1">
          <t>
            <xsl:call-template name="StructRepro"/>
          </t>
      </xsl:variable>

      <xsl:apply-templates mode="pass2"
         select="ext:node-set($vrtfPass1)/*/*" />
     </xsl:template>

 <xsl:template match="Order" mode="pass2">
  <xsl:apply-templates select=".//@* | .//text()" mode="pass2"/>
  <xsl:text>&#xA;</xsl:text>
 </xsl:template>

 <xsl:template match="@*|text()" mode="pass2">
  <xsl:if test="not(position()=1) and not(self::text())">/</xsl:if>
  <xsl:if test="not(position()=1) and self::text()">|</xsl:if>
  <xsl:value-of select="."/>
 </xsl:template>


     <xsl:template name="StructRepro">
       <xsl:param name="pLeaves" select="$pLeafNodes"/>

       <xsl:for-each select="$pLeaves">
         <xsl:apply-templates mode="build" select="/*">
          <xsl:with-param name="pChild" select="."/>
          <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
       </xsl:for-each>
     </xsl:template>

      <xsl:template mode="build" match="node()|@*">
          <xsl:param name="pChild"/>
          <xsl:param name="pLeaves"/>

         <xsl:copy>
           <xsl:apply-templates mode="build" select="@*"/>

           <xsl:variable name="vLeafChild" select=
             "*[count(.|$pChild) = count($pChild)]"/>

           <xsl:choose>
            <xsl:when test="$vLeafChild">
             <xsl:apply-templates mode="build"
                 select="$vLeafChild
                        |
                          node()[not(count(.|$pLeaves) = count($pLeaves))]">
                 <xsl:with-param name="pChild" select="$pChild"/>
                 <xsl:with-param name="pLeaves" select="$pLeaves"/>
             </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
             <xsl:apply-templates mode="build" select=
             "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                    or
                     .//*[count(.|$pChild) = count($pChild)]
                    ]
             ">

                 <xsl:with-param name="pChild" select="$pChild"/>
                 <xsl:with-param name="pLeaves" select="$pLeaves"/>
             </xsl:apply-templates>
            </xsl:otherwise>
           </xsl:choose>
         </xsl:copy>
     </xsl:template>
     <xsl:template match="text()"/>
</xsl:stylesheet>

应用于提供的XML文档

<Order id='79223510'>
    <Status>new</Status>
    <ShipMethod>Standard International</ShipMethod>
    <ToCity>Tokyo</ToCity>
    <Marketplace id="123-45678-9089808" name="MyBooks" />
    <Items>
        <Item>
            <SKU>SKU-1234567890</SKU>
            <Quantity>1</Quantity>
            <Price>99.95</Price>
        </Item>
        <Item>
            <SKU>SKU-1234567899</SKU>
            <Quantity>1</Quantity>
            <Price>199.95</Price>
        </Item>
    </Items>
</Order>

会产生想要的正确结果:

79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234567890|1|99.95
79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234567899|1|199.95

这是一个更复杂的XML文档,我根据提供的文档创建

<Order id='79223510'>
    <Status>new</Status>
    <ShipMethod>Standard International</ShipMethod>
    <ToCity>Tokyo</ToCity>
    <Marketplace id="123-45678-9089808" name="MyBooks" />
    <Items>
        <Item>
         <X>
            <SKU>SKU-1234567890</SKU>
            <Quantity>1</Quantity>
            <Price>99.95</Price>
         </X>
         <X>
            <SKU>SKU-1234554321</SKU>
            <Quantity>1</Quantity>
            <Price>199.95</Price>
         </X>
        </Item>
        <Item>
         <Y>
            <SKU>SKU-0987654321</SKU>
            <Quantity>1</Quantity>
            <Price>299.95</Price>
         </Y>
         <Y>
            <SKU>SKU-0987667890</SKU>
            <Quantity>1</Quantity>
            <Price>399.95</Price>
         </Y>
        </Item>
    </Items>
</Order>

当对此第二个文档应用相同的转换时,会再次生成所需的正确结果

79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234567890|1|99.95
79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-1234554321|1|199.95
79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-0987654321|1|299.95
79223510|new|Standard International|Tokyo/123-45678-9089808/MyBooks|SKU-0987667890|1|399.95

<强>解释

  1. 这是两次转换。

  2. 第一遍将源XML文档转换为碎片文档。使用了来自 this answer 的通用碎化解决方案。这里最重要的是正确指定碎片的“叶子”节点。这些是以下任何元素节点:1)没有子元素本身具有子元素; 2)其名称与其前一个兄弟元素的名称相同,或者与其后续兄弟的名称相同。

  3. 中间结果是

    <t>
       <Order id="79223510">
          <Status>new</Status>
          <ShipMethod>Standard International</ShipMethod>
          <ToCity>Tokyo</ToCity>
          <Marketplace id="123-45678-9089808" name="MyBooks"/>
          <Items>
             <Item>
                <X>
                   <SKU>SKU-1234567890</SKU>
                   <Quantity>1</Quantity>
                   <Price>99.95</Price>
                </X>
             </Item>
          </Items>
       </Order>
       <Order id="79223510">
          <Status>new</Status>
          <ShipMethod>Standard International</ShipMethod>
          <ToCity>Tokyo</ToCity>
          <Marketplace id="123-45678-9089808" name="MyBooks"/>
          <Items>
             <Item>
                <X>
                   <SKU>SKU-1234554321</SKU>
                   <Quantity>1</Quantity>
                   <Price>199.95</Price>
                </X>
             </Item>
          </Items>
       </Order>
       <Order id="79223510">
          <Status>new</Status>
          <ShipMethod>Standard International</ShipMethod>
          <ToCity>Tokyo</ToCity>
          <Marketplace id="123-45678-9089808" name="MyBooks"/>
          <Items>
             <Item>
                <Y>
                   <SKU>SKU-0987654321</SKU>
                   <Quantity>1</Quantity>
                   <Price>299.95</Price>
                </Y>
             </Item>
          </Items>
       </Order>
       <Order id="79223510">
          <Status>new</Status>
          <ShipMethod>Standard International</ShipMethod>
          <ToCity>Tokyo</ToCity>
          <Marketplace id="123-45678-9089808" name="MyBooks"/>
          <Items>
             <Item>
                <Y>
                   <SKU>SKU-0987667890</SKU>
                   <Quantity>1</Quantity>
                   <Price>399.95</Price>
                </Y>
             </Item>
          </Items>
       </Order>
    </t>
    

    0.3。第二遍处理中间结果的top元素的children元素 - 处理以“pass2”模式完成。

    这个第二遍处理相当简单 - 所有后代属性或后代文本节点按文档顺序处理,并且它们的值使用与节点类型相对应的分隔符输出(“|”表示文本节点和' /'为属性)。