将XML中的元素分隔为单独的文件

时间:2016-08-17 10:00:24

标签: java xml xslt

我有一个XML文件,它有很多嵌套的主题元素。例如:

<?xml version="1.0" encoding="UTF-8"?>
<topic id="topic-1">
    <title>ADBT</title>

    <para>The program executes a database request by using the ADBT
        library. The ADBT library prepares
        the request and calls an ODBC driver
        or a native API.  
    </para>

    <topic id="topic_wom_eqy_ev">
        <title>Establishing a connection</title>
        <para>
            In order to use a database with ADBT, the first step to be taken
            is
            to establish a
            connection.
        </para>

    </topic>
    <topic id="topic_dsw_gqy_ev">
        <title>Querying a database</title>
        <para>Querying a database involves a number of stages.</para>
        <topic id="topic_ljf_isy_ev">
            <title>Stage one: create a query</title>
            <para> A new query (ADBT_Select object) can only be created starting
                from a previously
                established connection. A query is created using
                the CreateSelect method in two
                different
                ways:
            </para>
        </topic>
    </topic>

</topic>

我希望将每个主题分成一个新的XML文件,其文件名与title相同。如果主题包含另一个主题,则该主题将是单独的文件,父主题将是一个单独的文件,其内容不包括子主题。例如,在这种情况下,将有四个文件作为输出,具有以下内容:

第1名:

<topic id="topic-1">
        <title>ADBT</title>

        <para>The program executes a database request by using the ADBT
            library. The ADBT library prepares
            the request and calls an ODBC driver or a native API.  
        </para>
    </topic>

2号:

<topic id="topic_wom_eqy_ev">
        <title>Establishing a connection</title>
        <para>
            In order to use a database with ADBT, the first step to be taken is
            to establish a
            connection. 
        </para>     

    </topic>

3号:

<topic id="topic_dsw_gqy_ev">
        <title>Querying a database</title>
        <para>Querying a database involves a number of stages.</para>
</topic>

第4名:

<topic id="topic_ljf_isy_ev">
            <title>Stage one: create a query</title>
            <para> A new query (ADBT_Select object) can only be created starting
                from a previously
                established connection. A query is created using the CreateSelect method in two
                different
                ways:
            </para>
            </topic>

我写了很少的函数,但我无法弄清楚如何分离多级嵌套主题。

2 个答案:

答案 0 :(得分:1)

基本上,你想要做的是:

  • 使用您选择的XML阅读器阅读XML
  • 以递归方式获取文档中的所有<topic>元素
  • 对于每个<topic>元素,创建该元素的副本(可能是每个元素的新文档,其根目录为<topic>元素),从原始元素复制所有子元素但是tagName = topic的孩子。这可以保证递归调用不会产生重叠元素
  • 对于每个这样创建的Document,使用您选择的XML编写器将其序列化为文件

因此,对于原理图代码:

Document document = readXMLDocument(...);
List<Element> topicElements = readTopicElementsRecursively(document);
List<Document> splitTopicDocuments = new ArrayList<>();
for (Element el : topicElements) {
    Document doc = copyElementWithoutTopicChildren(el);
    splitTopicDocuments.add(doc);
}
writeTopicDocuments(splitTopicDocuments);

答案 1 :(得分:1)

使用XSLT 2.0可用于Saxon 9的Java,您可以使用

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">

    <xsl:template match="/">
        <xsl:for-each select="//topic">
            <xsl:result-document href="topic{position()}.xml">
                <xsl:call-template name="identity"/>
            </xsl:result-document>          
        </xsl:for-each>
    </xsl:template>

    <xsl:template match="@* | node()" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="topic"/>

</xsl:stylesheet>