使用Transformer处理空CDATA时的IndexOutOfBoundsException

时间:2015-01-19 15:34:35

标签: java xml stax

我想从大型XML文件中提取特定节点。这很有效,直到出现没有任何内容的狂野CDATA。

输出:

ERROR:  ''
javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
    at xml_test.XML_Test.extractXML2(XML_Test.java:698)
    at xml_test.XML_Test.main(XML_Test.java:811)
Caused by: java.lang.IndexOutOfBoundsException
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
    ... 3 more
---------
java.lang.IndexOutOfBoundsException
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
    at xml_test.XML_Test.extractXML2(XML_Test.java:698)
    at xml_test.XML_Test.main(XML_Test.java:811)

代码:

InputStream stream = new FileInputStream("C:\\myFile.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(stream);

TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();

String extractPath = "/root";
String path = "";

while(reader.hasNext()) {
    reader.next();

    if(reader.isStartElement()) {
        path += "/" + reader.getLocalName();

        if(path.equals(extractPath)) {
            StringWriter writer = new StringWriter();
            StAXSource src = new StAXSource(reader);
            StreamResult res = new StreamResult(writer);
            t.transform(src, res); // Exception thrown

            System.out.println(writer.toString());

            path = path.substring(0, path.lastIndexOf("/"));
        }
    }
    else if(reader.isEndElement()) {
        path = path.substring(0, path.lastIndexOf("/"));
    }
}

引发错误的XML:

<foo><![CDATA[]]></foo>

我可以让Transformer忽略它吗?或者另一个实现是什么样的?我无法更改输入XML!

2 个答案:

答案 0 :(得分:4)

这是关于Xerces实现的问题,请检查: https://issues.apache.org/jira/browse/XERCESJ-1033

似乎空CDATA不存在,所以我可以给你的唯一建议是:

  1. 更改XML解析器实现
  2. 从源文件中删除空CDATA(将“<![CDATA[]]>”替换为“”)
    或者在CDATA中放一个空格,例如<![CDATA[ ]]>
  3. 我在另一个实现中添加了一些示例。

    JAXB

    在Jaxb中,您可以通过简单的方式将XML映射到POJO。

    例如,如果您在c:\ myFile.xml中有下一个xml文件:

    <root>
      <foo><![CDATA[]]></foo>
      <foo><![CDATA[some data here]]></foo>
    </root>
    

    你可以拥有下一个POJO:

    @XmlRootElement
    public class Root {
    
      @XmlElement(name="foo")
      privateList<Foo> foo;
    
      public List<Foo> getFooList() {
        return foo;
      }
    
      public void setFooList(List<Foo> fooList) {
        this.foo = fooList;
      }
    
    }
    
    @XmlType(name = "foo")
    public class Foo {
    
      @XmlValue
      private String content;
    
      @Override
      public String toString() {
        return content;
      }
    
    }
    

    然后使用下一个代码段从XML解析为Object:

        public static void main(String[] args) {
        try {
    
            File file = new File("C:\\myFile.xml");
            JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
    
            Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
            Root root = (Root) jaxbUnmarshaller.unmarshal(file);
    
            for (Foo foo : root.getFooList()) {
                System.out.println(String.format("Foo content: |%s|", foo));
            }
    
        } catch (JAXBException e) {
            e.printStackTrace();
        }
    
    }
    

    我对此进行了测试并且没有出错。

答案 1 :(得分:0)

我在同一应用程序的两个版本中遇到了此错误,一个版本在处理空<![CDATA[]]>时出现错误,而另一个版本则没有。

原来的差异是,损坏的构建使用Xerces(嵌入jre),而工作的构建在类路径https://mvnrepository.com/artifact/org.codehaus.woodstox/woodstox-core-asl上添加了额外的依赖关系。

与堆栈构建有关的堆栈跟踪的相关部分将是

java.lang.Exception
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1144)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
        at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
        at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
        at javax.xml.validation.Validator.validate(Validator.java:124)

适用于构建版本

java.lang.Exception
    at com.ctc.wstx.sr.BasicStreamReader.getTextCharacters(BasicStreamReader.java:894)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
    at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
    at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
    at javax.xml.validation.Validator.validate(Validator.java:124)

此问题解答帮助我对Woodstox What is the relation between fasterxml(jackson-dataformat-xml) and Woodstox?感到“舒服”。