我有一个XML文件,它有很多嵌套的主题元素。例如:
<?xml version="1.0" encoding="UTF-8"?>
<topic id="topic-1">
<title>ADBT</title>
<para>The program executes a database request by using the ADBT
library. The ADBT library prepares
the request and calls an ODBC driver
or a native API.
</para>
<topic id="topic_wom_eqy_ev">
<title>Establishing a connection</title>
<para>
In order to use a database with ADBT, the first step to be taken
is
to establish a
connection.
</para>
</topic>
<topic id="topic_dsw_gqy_ev">
<title>Querying a database</title>
<para>Querying a database involves a number of stages.</para>
<topic id="topic_ljf_isy_ev">
<title>Stage one: create a query</title>
<para> A new query (ADBT_Select object) can only be created starting
from a previously
established connection. A query is created using
the CreateSelect method in two
different
ways:
</para>
</topic>
</topic>
</topic>
我希望将每个主题分成一个新的XML文件,其文件名与title相同。如果主题包含另一个主题,则该主题将是单独的文件,父主题将是一个单独的文件,其内容不包括子主题。例如,在这种情况下,将有四个文件作为输出,具有以下内容:
第1名:
<topic id="topic-1">
<title>ADBT</title>
<para>The program executes a database request by using the ADBT
library. The ADBT library prepares
the request and calls an ODBC driver or a native API.
</para>
</topic>
2号:
<topic id="topic_wom_eqy_ev">
<title>Establishing a connection</title>
<para>
In order to use a database with ADBT, the first step to be taken is
to establish a
connection.
</para>
</topic>
3号:
<topic id="topic_dsw_gqy_ev">
<title>Querying a database</title>
<para>Querying a database involves a number of stages.</para>
</topic>
第4名:
<topic id="topic_ljf_isy_ev">
<title>Stage one: create a query</title>
<para> A new query (ADBT_Select object) can only be created starting
from a previously
established connection. A query is created using the CreateSelect method in two
different
ways:
</para>
</topic>
我写了很少的函数,但我无法弄清楚如何分离多级嵌套主题。
答案 0 :(得分:1)
基本上,你想要做的是:
<topic>
元素<topic>
元素,创建该元素的副本(可能是每个元素的新文档,其根目录为<topic>
元素),从原始元素复制所有子元素但是tagName = topic
的孩子。这可以保证递归调用不会产生重叠元素Document
,使用您选择的XML编写器将其序列化为文件因此,对于原理图代码:
Document document = readXMLDocument(...);
List<Element> topicElements = readTopicElementsRecursively(document);
List<Document> splitTopicDocuments = new ArrayList<>();
for (Element el : topicElements) {
Document doc = copyElementWithoutTopicChildren(el);
splitTopicDocuments.add(doc);
}
writeTopicDocuments(splitTopicDocuments);
答案 1 :(得分:1)
使用XSLT 2.0可用于Saxon 9的Java,您可以使用
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="/">
<xsl:for-each select="//topic">
<xsl:result-document href="topic{position()}.xml">
<xsl:call-template name="identity"/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="@* | node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="topic"/>
</xsl:stylesheet>