我需要使用有限的内存使用来验证大xml。到目前为止,我找到的每个代码都会出现内存错误。
我试过的方法:
//method 1
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())}));
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(inputXml));
//method2
XMLValidationSchemaFactory sf = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
XMLValidationSchema vs = sf.createSchema(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd"));
XMLStreamReader2 sr = (XMLStreamReader2) XMLInputFactory2.newInstance().createXMLStreamReader(new FileInputStream(inputXml));
sr.validateAgainst(vs);
try {
while (sr.hasNext()) {
sr.next();
}
System.out.println("Validated ok!");
} catch (XMLValidationException ve) {
System.err.println("Validation problem: "+ve);
isValid = false;
}
sr.close();
//方法3
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
String fileName = Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile();
Schema schema = factory.newSchema(new File(fileName));
Validator validator = schema.newValidator();
// create a source from a file
StreamSource source = new StreamSource(new File(inputXml));
// check input
validator.validate(source);
我每次都得到OutOfMemory
修改
使用XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())}));
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
Builder builder = new Builder(reader);
builder.build(new FileInputStream(new File(inputXml)));
仍然内存使用率非常高,为15mb xml - 250mb的堆 堆栈跟踪:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleCharacters(XMLSchemaValidator.java:1574)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.characters(XMLSchemaValidator.java:789)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:441)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at nu.xom.Builder.build(Unknown Source)
at nu.xom.Builder.build(Unknown Source)
修改 我的xml有大的base64字符串
答案 0 :(得分:3)
请看Marco Tedone see here关于XML解组的这篇文章。 基于他的结论,我建议低内存消耗STax:
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(fileInputStream);
Validator validator = schema.newValidator();
validator.validate(new StAXSource(xmlStreamReader));
答案 1 :(得分:0)
内存可能用于架构,而不是源文档。您还没有说过架构。有些人可能会使用非常大量的内存,例如,如果内容模型中有大量有限值minOccurs或maxOccurs。在什么时候出现内存不足异常?