使用JAXB和Stax验证XML文档的编组

时间:2010-03-18 15:40:39

标签: java xml xsd jaxb stax

我创建了一个XML模式(foo.xsd)并使用xjc为JAXB创建了我的绑定类。假设根元素是collection,我正在编写N document个对象,它们是复杂的类型。

因为我打算写出大型XML文件,我使用Stax写出collection根元素,而JAXB使用Marshaller.marshal(JAXBElement, XMLEventWriter)来编组文档子树。这是jaxb's unofficial user's guide建议的方法。

我的问题是,如何在编组XML时对其进行验证?如果我将模式绑定到JAXB编组器(使用Marshaller.setSchema()),我会得到验证错误,因为我只是编组一个子树(它抱怨它没有看到collection根元素“)。我想是什么我真的想做的是将模式绑定到Stax XMLEventWriter或类似的东西。

对这种整体方法的任何评论都会有所帮助。基本上我希望能够使用JAXB来编组和解组大型XML文档,而不会耗尽内存,所以如果有更好的方法,请告诉我。

2 个答案:

答案 0 :(得分:3)

一些Stax实现似乎能够验证输出。请参阅以下类似问题的答案:

Using Stax2 with Woodstox

答案 1 :(得分:1)

仅当Marshaller调用Iterator.next()时,才可以使根集合变得懒惰并实例化项目。然后,一次对marshal()的调用将产生一个巨大的经过验证的XML。您不会用完内存,因为GC已收集了已经序列化的bean。

此外,如果需要有条件地跳过null作为集合元素也可以。不会有NPE。

即使在大型XML上,XML模式验证器本身似乎也占用很少的内存。

请参见JAXB的ArrayElementProperty.serializeListBody()

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;

import javax.xml.XMLConstants;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Marshaller;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.namespace.QName;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "TestHuge")
public class TestHuge {

    static final boolean MISPLACE_HEADER = true;

    private static final int LIST_SIZE = 20000;

    static final String HEADER = "Header";

    static final String DATA = "Data";

    @XmlElement(name = HEADER)
    String header;

    @XmlElement(name = DATA)
    List<String> data;

    @XmlAnyElement
    List<Object> content;

    public static void main(final String[] args) throws Exception {

        final JAXBContext jaxbContext = JAXBContext.newInstance(TestHuge.class);

        final Schema schema = genSchema(jaxbContext);

        final Marshaller marshaller = jaxbContext.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setSchema(schema);

        final TestHuge instance = new TestHuge();

        instance.content = new AbstractList<Object>() {

            @Override
            public Object get(final int index) {
                return instance.createChild(index);
            }

            @Override
            public int size() {
                return LIST_SIZE;
            }
        };

        // throws MarshalException ... Invalid content was found starting with element 'Header'
        marshaller.marshal(instance, new Writer() {

            @Override
            public void write(final char[] cbuf, final int off, final int len) throws IOException {}

            @Override
            public void write(final int c) throws IOException {}

            @Override
            public void flush() throws IOException {}

            @Override
            public void close() throws IOException {}
        });

    }

    private JAXBElement<String> createChild(final int index) {
        if (index % 1000 == 0) {
            System.out.println("serialized so far: " + index);
        }
        final String tag = index == getHeaderIndex(content) ? HEADER : DATA;

        final String bigStr = new String(new char[1000000]);
        return new JAXBElement<String>(new QName(tag), String.class, bigStr);
    }

    private static int getHeaderIndex(final List<?> list) {
        return MISPLACE_HEADER ? list.size() - 1 : 0;
    }

    private static Schema genSchema(final JAXBContext jc) throws Exception {
        final List<StringWriter> outs = new ArrayList<>();
        jc.generateSchema(new SchemaOutputResolver() {

            @Override
            public Result createOutput(final String namespaceUri, final String suggestedFileName)
                                                                                                  throws IOException {
                final StringWriter out = new StringWriter();
                outs.add(out);
                final StreamResult streamResult = new StreamResult(out);
                streamResult.setSystemId("");
                return streamResult;
            }
        });
        final StreamSource[] sources = new StreamSource[outs.size()];
        for (int i = 0; i < outs.size(); i++) {
            final StringWriter out = outs.get(i);
            sources[i] = new StreamSource(new StringReader(out.toString()));
        }
        final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        final Schema schema = sf.newSchema(sources);
        return schema;
    }
}