XSLT根据内容选择和合并节点

时间:2019-01-28 10:36:50

标签: xml xslt xpath

很抱歉,如果已经问过类似的问题,我对xsl还是陌生的,找不到合适的答案。

我正在尝试将XML转换为另一个XML文件。问题是,在输入xml中,我仅有的节点是* 564630f - (HEAD -> kimura, master) Merge branch 'kimura' into 'master' (Mon Jan 28 16:01:30 2019 +0000) <nshephard> |\ | * c6c75ba - Updating to use RSA keys as seahorse doesn't like ed25519 keys (Mon Jan 28 16:00:35 2019 +0000) <slackline> |/ * 8464bcc - Merge branch 'kimura' into 'master' (Sat Jan 26 09:32:05 2019 +0000) <nshephard> |\ | * ef61530 - Resolving conflict to merge master into kimura (Sat Jan 26 09:30:54 2019 +0000) <slackline> | |\ | | * 1ece19c - (origin/master) Reinstated sourcing of virtualenvwrapper.sh on work host (Fri Jan 18 10:52:17 2019 +0000) <Neil Shephard> | | * 80efc48 - Merge branch 'kimura' into 'master' (Mon Dec 17 18:10:50 2018 +0000) <nshephard> | | |\ * | | \ 6138abc - Merge branch 'kimura' into 'master' (Thu Jan 24 11:39:35 2019 +0000) <nshephard> |\ \ \ \ | |/ / / | * | | f32a089 - tweaking virtualenvwrapper.sh path for new host (Thu Jan 24 11:39:04 2019 +0000) <slackline> | * | | d2ccb42 - Tweaking specifics for work machine. (Thu Jan 24 11:37:48 2019 +0000) <slackline> | * | | 13fc696 - Updates to a few files (Thu Jan 24 07:09:41 2019 +0000) <slackline> | * | | a453190 - Added gnupg to link section (Wed Jan 23 13:08:29 2019 +0000) <slackline> | * | | c7da4ac - Added todo task and display of warnings at end of setup (Wed Jan 23 11:01:08 2019 +0000) <slackline> | * | | 07f313b - Added gnupg (Wed Jan 23 10:58:22 2019 +0000) <slackline> | * | | 20cf7f8 - Copying sample code from James Ridgway https://github.com/jamesridgway/dotfiles/blob/master/setup (Wed Jan 23 10:57:32 2019 +0000) <slackline> | * | | 4edf7b5 - updated path for /mnt/personal (Sat Jan 12 07:55:45 2019 +0000) <slackline> | * | | 0b61635 - adding work_laptop profile (Wed Dec 19 12:34:16 2018 +0000) <slackline> | * | | 71c7d3f - Adding config/.config/.pycodestyle (Tue Dec 18 16:32:52 2018 +0000) <slackline> | | |/ | |/| | * | 8ac383a - (origin/kimura) Starting off yapf config (Mon Dec 17 17:15:42 2018 +0000) <slackline> | |/ | * 3d6aac6 - Merge branch 'master' of gitlab.com:nshephard/dotfiles into kimura (Mon Dec 10 10:41:47 2018 +0000) <slackline> 个元素。我必须提取这些元素的文本内容,并从中创建新节点,然后将其他一些节点合并到新节点中。 secind问题是,输入xml中没有真正的一致性。我真的很沮丧。

(我正在处理的输入XML比给出的示例长,但它遵循相同的模式:一个div具有页面类,每个div具有两个内容和段落)

输入xml:

<p>

我想要获得的输出是这样:

<root>
    <div class="page">
        <p>Content:</p>
        <p>This is the content. </p>
        <p>Content continues. </p>
        <p>End content.</p>
        <p>Paragraph:</p>
        <p>◼ Beginning of new paragraph. </p>
        <p>End of new paragraph.</p>
        <p>◼ New line here.</p>
        <p>Content:</p>
        <p>Heres lies the second content </p>
        <p>Continiuation of the second content. </p>
        <p>Second content ends.</p>
        <p>Paragraph:</p>
        <p>◼ Start of second paragraph. </p>
        <p>Finish of second paragraph.</p>
        <p>◼ This should also be separate.</p>
    </div>
    <div class="page">
        <p>Content:</p>
        <p>Third content starts here. </p>
        <p>Third content continues. </p>
        <p>End content three.</p>
        <p>Paragraph:</p>
        <p>◼ Beginning of third paragraph. </p>
        <p>End of third paragraph.</p>
        <p>◼ And again a new line.</p>
    </div>
</root>

1 个答案:

答案 0 :(得分:0)

我不确定所需的确切逻辑,但您可能想在此处使用xsl:for-each-group

因此,首先选择p元素,然后将它们按以冒号结尾的元素分组

<xsl:for-each-group select="p" group-starting-with="p[ends-with(., ':')]">

然后,您可以使用current-group()处理该组。但是,段落需要做更多的工作,因为您需要嵌套的xsl:for-each来处理以该有趣符号开头的段落。

<xsl:for-each-group select="current-group() except ." group-starting-with="p[starts-with(., '◼')]">

尝试使用此XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <xsl:output method="xml" indent="yes" />
  <xsl:strip-space elements="*" />

  <xsl:template match="div[@class='page']">
    <page>
      <xsl:for-each-group select="p" group-starting-with="p[ends-with(., ':')]">
        <xsl:choose>
          <xsl:when test=". = 'Content:'">
            <title><xsl:value-of select="." /></title>
            <content>
              <xsl:value-of select="current-group() except ." separator="" />
            </content>
          </xsl:when>
          <xsl:when test=". = 'Paragraph:'">
            <paragraph><xsl:value-of select="." /></paragraph>
            <xsl:for-each-group select="current-group() except ." group-starting-with="p[starts-with(., '◼')]">
              <pcontent>
                <xsl:value-of select="current-group()" separator="" />
              </pcontent>
            </xsl:for-each-group>
          </xsl:when>
        </xsl:choose>
      </xsl:for-each-group>
    </page>
  </xsl:template>
</xsl:stylesheet>