在XSLT 2中将没有父元素的模板匹配分组

时间:2015-02-26 17:22:38

标签: xml xslt xslt-2.0

我有一个非结构化的XML文档(取自Pandoc转换的docx到docbook格式),我试图用XSLT清理它。 xml的格式是这样的;

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<article>
  <articleinfo>
    <title></title>
  </articleinfo>
<informaltable>
  <tgroup cols="2">
    <colspec align="left" />
    <colspec align="left" />
    <thead>
      <row>
        <entry>
          <emphasis role="strong">How did you assist
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
    </thead>
    <tbody>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
          <emphasis role="strong">What difference did this make for the
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
<para>
  Staff Member: John Smith
</para>
<informaltable>
  <tgroup cols="2">
    <colspec align="left" />
    <colspec align="left" />
    <thead>
      <row>
        <entry>
          <emphasis role="strong">How did you assist
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
    </thead>
    <tbody>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
          <emphasis role="strong">What difference did this make for the
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
<para>
  Staff Member: John Smith
</para>
<informaltable>
  <tgroup cols="2">
    <colspec align="left" />
    <colspec align="left" />
    <thead>
      <row>
        <entry>
          <emphasis role="strong">How did you assist
          Customer?</emphasis>
        </entry>
        <entry>
        </entry>
      </row>
    </thead>
    <tbody>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
          <emphasis role="strong">What difference did this make for the
          Customer?</emphasis>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
<para>
  Staff Member: _________________________
</para>
</article>

我已经使用以下XSLT成功地修改了它;

<?xml version="1.0"?>

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" indent="yes"/>

    <xsl:variable name="fileDateStamp">
        <xsl:analyze-string select="base-uri(.)" regex="\s*(\d\d\d\d\-\d\d\-\d\d)\s*">
            <xsl:matching-substring>
                <xsl:value-of select="regex-group(1)"/>
            </xsl:matching-substring>
        </xsl:analyze-string>       
    </xsl:variable>

    <xsl:template match="/">
        <impactStatements>
            <xsl:apply-templates/>
        </impactStatements>
    </xsl:template>

    <xsl:template match="informaltable/tgroup/thead/row/entry">
        <xsl:analyze-string select="normalize-space(.)" regex="\s*How(.*)\s*">
            <xsl:matching-substring>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <Assisted>
                    <xsl:value-of select="(.)"/>    
                </Assisted>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="informaltable/tgroup/tbody/row/entry">
        <xsl:analyze-string select="normalize-space(.)" regex="\s*What(.*)\s*">
            <xsl:matching-substring>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <Difference>
                    <xsl:value-of select="(.)"/>
                </Difference>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="para">
        <xsl:analyze-string select="normalize-space(.)" regex="\s*\Staff Member: ([A-Z].*)\s*">
            <xsl:matching-substring>
                <Staff><xsl:value-of select="regex-group(1)"/></Staff>
                <DateCreated><xsl:value-of select="$fileDateStamp"/></DateCreated>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

</xsl:stylesheet> 

但我缺少的是能够在每个'记录'周围添加标签。由于<informaltable><para>都是<article>的孩子,我最基本的XSLT知识完全让我失望。我得到了

<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
   <Assisted>Lorem ipsum dolor sit amet.</Assisted>
   <Difference>Lorem ipsum dolor sit amet.</Difference>
   <Staff>John Smith</Staff>
   <DateCreated>2014-01-01</DateCreated>
   <Assisted>Lorem ipsum dolor sit amet.</Assisted>
   <Difference>Lorem ipsum dolor sit amet.</Difference>
   <Staff>John Smith</Staff>
   <DateCreated>2014-01-01</DateCreated>
</impactStatements>

但我想要;

<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
    <statement>
        <Assisted>Lorem ipsum dolor sit amet.</Assisted>
        <Difference>Lorem ipsum dolor sit amet.</Difference>
        <Staff>John Smith</Staff>
        <DateCreated>2014-01-01</DateCreated>
    </statement>
    <statement>
        <Assisted>Lorem ipsum dolor sit amet.</Assisted>
        <Difference>Lorem ipsum dolor sit amet.</Difference>
        <Staff>John Smith</Staff>
        <DateCreated>2014-01-01</DateCreated>
    </statement>
</impactStatements>

这是一个一次性的工作,我知道我可以通过其他方式更改XML,但我确信我只是缺乏一些基本知识来改变XSLT我必须按照我的意愿去做。我尝试过各种不同的方法并用Google搜索,但无济于事。我尝试过的所有内容都会破坏我生成的XML的格式。

2 个答案:

答案 0 :(得分:2)

我首先要添加一个模板

<xsl:template match="article">
  <xsl:for-each-group select="*" group-starting-with="informaltable">
    <statement>
      <xsl:apply-templates select="current-group()"/>
    </statement>
  </xsl:for-each-group>
</xsl:template>

对于您的样本(以及在添加<xsl:strip-space elements="*"/>以提高可读性之后),我得到了输出

<impactStatements>
   <statement/>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement/>
</impactStatements>

我不确定空statement元素是否是由缺少样本数据引起的,或者您是否要从处理中排除某些元素,您需要解释输入中哪些元素应该创建结果{ {1}}。

答案 1 :(得分:1)

一个有趣且问题很好的问题!将匹配/的模板更改为

<xsl:template match="/article">
    <impactStatements>
    <xsl:for-each select="informaltable">
        <statement>
            <xsl:apply-templates select=". | following-sibling::*[self::para][1]"/>
        </statement>
    </xsl:for-each>
    </impactStatements>
</xsl:template>

结果是:

<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement/>
</impactStatements>

我认为这几乎是正确的。最后有一个空的statement,因为输入中有3个informaltable个元素。你想怎么处理它?<​​/ p>