我创建了一个XML模式(foo.xsd)并使用xjc
为JAXB创建了我的绑定类。假设根元素是collection
,我正在编写N document
个对象,它们是复杂的类型。
因为我打算写出大型XML文件,我使用Stax写出collection
根元素,而JAXB使用Marshaller.marshal(JAXBElement, XMLEventWriter)
来编组文档子树。这是jaxb's unofficial user's guide建议的方法。
我的问题是,如何在编组XML时对其进行验证?如果我将模式绑定到JAXB编组器(使用Marshaller.setSchema()
),我会得到验证错误,因为我只是编组一个子树(它抱怨它没有看到collection
根元素“)。我想是什么我真的想做的是将模式绑定到Stax XMLEventWriter
或类似的东西。
对这种整体方法的任何评论都会有所帮助。基本上我希望能够使用JAXB
来编组和解组大型XML文档,而不会耗尽内存,所以如果有更好的方法,请告诉我。
答案 0 :(得分:3)
一些Stax实现似乎能够验证输出。请参阅以下类似问题的答案:
答案 1 :(得分:1)
仅当Marshaller调用Iterator.next()
时,才可以使根集合变得懒惰并实例化项目。然后,一次对marshal()
的调用将产生一个巨大的经过验证的XML。您不会用完内存,因为GC已收集了已经序列化的bean。
此外,如果需要有条件地跳过null
作为集合元素也可以。不会有NPE。
即使在大型XML上,XML模式验证器本身似乎也占用很少的内存。
请参见JAXB的ArrayElementProperty.serializeListBody()
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;
import javax.xml.XMLConstants;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Marshaller;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.namespace.QName;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "TestHuge")
public class TestHuge {
static final boolean MISPLACE_HEADER = true;
private static final int LIST_SIZE = 20000;
static final String HEADER = "Header";
static final String DATA = "Data";
@XmlElement(name = HEADER)
String header;
@XmlElement(name = DATA)
List<String> data;
@XmlAnyElement
List<Object> content;
public static void main(final String[] args) throws Exception {
final JAXBContext jaxbContext = JAXBContext.newInstance(TestHuge.class);
final Schema schema = genSchema(jaxbContext);
final Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
marshaller.setSchema(schema);
final TestHuge instance = new TestHuge();
instance.content = new AbstractList<Object>() {
@Override
public Object get(final int index) {
return instance.createChild(index);
}
@Override
public int size() {
return LIST_SIZE;
}
};
// throws MarshalException ... Invalid content was found starting with element 'Header'
marshaller.marshal(instance, new Writer() {
@Override
public void write(final char[] cbuf, final int off, final int len) throws IOException {}
@Override
public void write(final int c) throws IOException {}
@Override
public void flush() throws IOException {}
@Override
public void close() throws IOException {}
});
}
private JAXBElement<String> createChild(final int index) {
if (index % 1000 == 0) {
System.out.println("serialized so far: " + index);
}
final String tag = index == getHeaderIndex(content) ? HEADER : DATA;
final String bigStr = new String(new char[1000000]);
return new JAXBElement<String>(new QName(tag), String.class, bigStr);
}
private static int getHeaderIndex(final List<?> list) {
return MISPLACE_HEADER ? list.size() - 1 : 0;
}
private static Schema genSchema(final JAXBContext jc) throws Exception {
final List<StringWriter> outs = new ArrayList<>();
jc.generateSchema(new SchemaOutputResolver() {
@Override
public Result createOutput(final String namespaceUri, final String suggestedFileName)
throws IOException {
final StringWriter out = new StringWriter();
outs.add(out);
final StreamResult streamResult = new StreamResult(out);
streamResult.setSystemId("");
return streamResult;
}
});
final StreamSource[] sources = new StreamSource[outs.size()];
for (int i = 0; i < outs.size(); i++) {
final StringWriter out = outs.get(i);
sources[i] = new StreamSource(new StringReader(out.toString()));
}
final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final Schema schema = sf.newSchema(sources);
return schema;
}
}