使用SAX解析器拆分XML

时间:2016-11-09 16:19:41

标签: java xml xpath sax

我有以下xml文件。

<Engineers>
    <Engineer>
        <Name>JOHN</Name>
        <Position>STL</Position>
        <Team>SS</Team>
    </Engineer>
    <Engineer>
        <Name>UDAY</Name>
        <Position>TL</Position>
        <Team>SG</Team>
    </Engineer>
    <Engineer>
        <Name>INDRA</Name>
        <Position>Director</Position>
        <Team>PP</Team>
    </Engineer>
</Engineers>

当Xpath作为Engineers / Enginner给出时,我需要将这个xml拆分成更小的xml字符串。

较小的xml字符串如下

<Engineers>
    <Engineer>
        <Name>INDRA</Name>
        <Position>Director</Position>
        <Team>PP</Team>
    </Engineer>
</Engineers>

<Engineers>
    <Engineer>
        <Name>JOHN</Name>
        <Position>STL</Position>
        <Team>SS</Team>
    </Engineer>
</Engineers>

到目前为止,我已经使用SAX实现了以下内容,我们可以在XML中获取元素但不是我想要的内容。我该怎么办?

public class ReadSAX
{
    public static void main( String[] args )
    {
        try {

              SAXParserFactory factory = SAXParserFactory.newInstance();
              SAXParser saxParser = factory.newSAXParser();

              DefaultHandler handler = new DefaultHandler() {

   public void startElement(String uri, String localName,
                    String qName, Attributes attributes)
                    throws SAXException {

                  System.out.println("Start Element :" + qName);



                public void endElement(String uri, String localName,
                        String qName)
                        throws SAXException {

                      System.out.println("End Element :" + qName);

                }

                public void characters(char ch[], int start, int length)
                    throws SAXException {

                  System.out.println(new String(ch, start, length));


                 }

              };

              File file = new File("c:\\file.xml");
              InputStream inputStream= new FileInputStream(file);
              Reader reader = new InputStreamReader(inputStream,"UTF-8");

              InputSource is = new InputSource(reader);
              is.setEncoding("UTF-8");

              saxParser.parse(is, handler);


            } catch (Exception e) {
              e.printStackTrace();
            }

    }
}

2 个答案:

答案 0 :(得分:1)

为什么要使用这种低级编码方法?

在XSLT 2.0中,它只是

<xsl:template match="/">
  <xsl:for-each select="Engineers/Engineer">
    <xsl:result-document select="{position()}.xml">
      <Engineers>
        <xsl:copy-of select="."/>
      </Engineers>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template> 

如果需要太多内存,请使用流式XSLT 3.0处理器来解决问题。

答案 1 :(得分:0)

我认为您需要做的是使用VTD-XML的剪切和粘贴功能......本文标题为java apis for xml 处理的性能分析,将在vtd-上告诉您更多信息xml ..

http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

import com.ximpleware.*;
import java.io.*;
public class splitXML {
    public static void main(String[] args) throws VTDException, IOException {
        VTDGen vg = new VTDGen();
        if (!vg.parseFile("d:\\xml\\input.xml", false)){
            System.out.println("error");
            return;
        }
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        ap.selectXPath("/engineers/engineer");
        int i=0,n=0;
        FileOutputStream fos =null;
        byte[] stag="<engineers>".getBytes();
        byte[] etag="</engineers>".getBytes();
        while((i=ap.evalXPath())!=-1){
            fos.write(stag);
            fos = new FileOutputStream("d:\\xml\\output"+(++n)+".xml");
            long l = vn.getElementFragment();
            fos.write(vn.getXML().getBytes(), (int)l, (int)(l>>32));
            fos.write(etag);
            fos.close();
        }
    }
}