由于要处理的文件大小(50-100MB xml文件),这可能超出了Java VM的功能
现在我有一组作为zips发送的xml文件,然后全部解压缩,然后使用SAX一次处理一个目录中的所有XML。
为了节省时间和空间(因为压缩大约是1:10),我想知道是否有办法将作为xml文件的ZipFileEntry传递给SAX处理程序。
我已经看到它使用DocumentBuilder和其他xml解析方法完成,但是对于性能(尤其是内存),我坚持使用SAX。
目前我正在以下列方式使用SAX
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
MyHandler handler = new MyHandler();
for( String curFile : xmlFiles )
{
System.out.println( "\n\n\t>>>>> open " + curFile + " <<<<<\n");
saxParser.parse( "file://" + new File( dirToProcess + curFile ).getAbsolutePath(), handler );
}
答案 0 :(得分:7)
您可以使用InputStream作为源parse a XML。因此,您可以打开ZipFile,获取所需条目的InputStream,然后解析它。请参阅getInputStream方法。
----编辑----
以下是一些指导您的代码:
for( String curFile : xmlFiles )
{
ZipFile zip = new ZipFile(new File( dirToProcess + curFile));
Enumeration<? extends ZipEntry> entries = zip.entries();
while (entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
InputStream xmlStream = zip.getInputStream(entry);
saxParser.parse( xmlStream, handler );
xmlStream.close();
}
}
答案 1 :(得分:1)
ZipInputStream.read()
将从ZipFileEntry
读取x个字节,解压缩并为您提供解压缩的字节。InputStream
提供给解析器。OutputStream
)。PS:
---编辑---
这就是我的意思:
import java.io.File;
import java.io.InputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Main {
static class MyRunnable implements Runnable {
private InputStream xmlStream;
private SAXParser sParser;
public MyRunnable(SAXParser p, InputStream is) {
sParser = p;
xmlStream = is;
}
public void run() {
try {
sParser.parse(xmlStream, new DefaultHandler() {
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println("\nStart Element :" + qName);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("\nEnd Element :" + qName);
}
});
System.out.println("Done parsing..");
} catch (Exception e) {
e.printStackTrace();
}
}
}
final static int BUF_SIZE = 5;
public static void main(String argv[]) {
try {
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
ZipFile zip = new ZipFile(new File("D:\\Workspaces\\Indigo\\Test\\performance.zip"));
Enumeration<? extends ZipEntry> entries = zip.entries();
while (entries.hasMoreElements()) {
// in stream for parser..
PipedInputStream xmlStream = new PipedInputStream();
// out stream attached to in stream above.. we would read from zip file and write to this..
// thus passing whatever we write to the parser..
PipedOutputStream out = new PipedOutputStream(xmlStream);
// Parser blocks in in stream, so put him on a different thread..
Thread parserThread = new Thread(new Main.MyRunnable(saxParser, xmlStream));
parserThread.start();
ZipEntry entry = entries.nextElement();
System.out.println("\nOpening zip entry: " + entry.getName());
InputStream unzippedStream = zip.getInputStream(entry);
byte buf[] = new byte[BUF_SIZE]; int bytesRead = 0;
while ((bytesRead = unzippedStream.read(buf)) > 0) {
// write to err for different color in eclipse..
System.err.write(buf, 0, bytesRead);
out.write(buf, 0, bytesRead);
Thread.sleep(150); // theatrics...
}
out.flush();
// give parser a couple o seconds to catch up just in case there is some IO lag...
parserThread.join(2000);
unzippedStream.close(); out.close(); xmlStream.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}