修改XML的标记

时间:2013-08-06 10:20:45

标签: java xml xslt

我正在寻找动态修改非常大的XML文件标记的最佳方法。

考虑以下输入XML:

输入

<?xml version="1.0" encoding="UTF-8"?>
<rootTag>
   <dictionary>
      <name>field1</name>
      <address>field2</address>
      <gender>field3</gender>
      .
      .
      <postcode>field30</postcode>
   </dictionary>
   <records>
      <record>
         <field id="field1">John</field>
         <field id="field2">Svalbard</field>
         <field id="field3">M</field>
         .
         .
         <field id="field30">12345</field>
      </record>
      .
      .
      <record>
      .
      .
      </record>
   </records>
</rootTag>

XML文件包含一个顶部的字典和一大块记录节点,其标签链接到字典。

我想将每个记录节点中的标记替换为字典中的相应值。因此,输出应如下所示:

输出

<?xml version="1.0" encoding="UTF-8"?>
<rootTag>
   <records>
      <record>
         <name>John</name>
         <address>Svalbard</address>
         <gender>M</gender>
         .
         .
         <postcode>12345</postcode>
      </record>
      .
      .
      <record>
      .
      .
      </record>
   </records>
</rootTag>

请记住,有大量<record>个节点,在Java中实现这种转换的最佳方法是什么?

请注意,我只想更改标签而不是属性。

6 个答案:

答案 0 :(得分:1)

我同意@PeterJaloveczki认为xslt可能就是这样。以下可以完成这项工作

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="dictionary" />

    <xsl:template match="field">
        <xsl:variable name="id" select="@id" />
        <xsl:variable name="tagName" select="/rootTag/dictionary/node()[. = $id]/name()" />

        <xsl:element name="{if ($tagName != '') then $tagName else 'field'}">
            <xsl:apply-templates select="node() | @*[name() != 'id']" />
        </xsl:element>
    </xsl:template>

</xsl:stylesheet>

在某些方面进行了简化,因为xml示例也得到了简化,但基本上它应该可以工作。

答案 1 :(得分:0)

可能使用XSLT是你最好的选择。

答案 2 :(得分:0)

我可能会使用SAX XML解析器,这将确保您不会立即加载整个DOM树。

简而言之,您首先要填充一个字典,然后在解析它们时逐个填充每个标记,将其名称替换为包含的字典。

关于如何在Java中处理SAX配对的示例: http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html

答案 3 :(得分:0)

一种选择是使用StAX,它具有高性能,它将xml作为流处理而不将整个xml加载到内存中,并且使用起来很方便。

答案 4 :(得分:0)

SAX Parser是一种可行的方式,因为它将XML解析为流而不是一次性读取它。 有关详细信息,请参阅此处:http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html

答案 5 :(得分:0)

为什么不手动解析XML?

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import junit.framework.Assert;

import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ReplaceTextInXmlTest
{
   @Test
   public void test(
   ) {
      try {

         final String inputXml = new String(
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
            "<rootTag>\n" +
            "   <dictionary>\n" +
            "      <name>field1</name>\n" +
            "      <address>field2</address>\n" +
            "      <gender>field3</gender>\n" +
            "   </dictionary>\n" +
            "   <records>\n" +
            "      <record>\n" +
            "         <field id=\"field1\">John</field>\n" +
            "         <field id=\"field2\">Svalbard</field>\n" +
            "         <field id=\"field3\">M</field>\n" +
            "      </record>\n" +
            "         <field id=\"field1\">Fritz</field>\n" +
            "         <field id=\"field2\">Hamburg</field>\n" +
            "         <field id=\"field3\">M</field>\n" +
            "      </record>\n" +
            "   </records>\n" +
            "</rootTag>"
         );
         final Map<Integer, String> mapping = new HashMap<>();
         final int start = inputXml.indexOf("<dictionary>");
         final int end = inputXml.indexOf("</dictionary>", start) + 13; // "</dictionary>".length() = 13
         final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
         final DocumentBuilder db = dbf.newDocumentBuilder();
         Document dom = null;
         try (
            ByteArrayInputStream is = new ByteArrayInputStream(inputXml.substring(start,    end).getBytes());
         ) {
            dom = db.parse(is);
         }
         final Element root = dom.getDocumentElement();
         final NodeList nodes = root.getChildNodes();
         for(int i = 0, z = nodes.getLength(); i < z; ++i) {
            final Node node = nodes.item(i);
            final int type = node.getNodeType();
            if(type == 1) {
               final String name = node.getNodeName();
               final String value = node.getTextContent();
               mapping.put(new Integer(Integer.parseInt(value.substring(5))), name);  // "field".length() = 5
            }
         }

         final Pattern fieldPattern = Pattern.compile("^(\\s*<)field id=\"field([0-9]+)\"   (>[^<]*</)field(>\\s*)$");
         final StringBuilder outputXml = new StringBuilder();
         try (
            BufferedReader reader = new BufferedReader(new StringReader(inputXml));
         ) {
            String line = null;
            while ((line = reader.readLine()) != null) {
               final Matcher match = fieldPattern.matcher(line);
               if(match.find() == true) {
                  final int fieldId = Integer.parseInt(match.group(2));
                  final String tagName = mapping.get(new Integer(fieldId));
                  outputXml.append(match.group(1));
                  outputXml.append(tagName);
                  outputXml.append(match.group(3));
                  outputXml.append(tagName);
                  outputXml.append(match.group(4));
               } else {
                  outputXml.append(line);
               }
               outputXml.append('\n');
            }
         }

         final String expectedXml = new String(
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
            "<rootTag>\n" +
            "   <dictionary>\n" +
            "      <name>field1</name>\n" +
            "      <address>field2</address>\n" +
            "      <gender>field3</gender>\n" +
            "   </dictionary>\n" +
            "   <records>\n" +
            "      <record>\n" +
            "         <name>John</name>\n" +
            "         <address>Svalbard</address>\n" +
            "         <gender>M</gender>\n" +
            "      </record>\n" +
            "         <name>Fritz</name>\n" +
            "         <address>Hamburg</address>\n" +
            "         <gender>M</gender>\n" +
            "      </record>\n" +
            "   </records>\n" +
            "</rootTag>\n"
         );
         Assert.assertEquals(expectedXml, outputXml.toString());

      } catch (final Exception e) {
         Assert.fail(e.getMessage());
      }
   }
}