Java SAX解析器进度监控

时间:2010-06-23 08:21:19

标签: java xml parsing sax progress

我正在用Java编写一个SAX解析器来解析维基百科文章的2.5GB XML文件。有没有办法监视Java中的解析进度?

5 个答案:

答案 0 :(得分:11)




 * A class that monitors the read progress of an input stream.
 * @author Hermia Yeung "Sheepy"
 * @since 2012-04-05 18:42
public class MonitoredInputStream extends FilterInputStream {
   private volatile long mark = 0;
   private volatile long lastTriggeredLocation = 0;
   private volatile long location = 0;
   private final int threshold;
   private final List<ChangeListener> listeners = new ArrayList<>(4);

    * Creates a MonitoredInputStream over an underlying input stream.
    * @param in Underlying input stream, should be non-null because of no public setter
    * @param threshold Min. position change (in byte) to trigger change event.
   public MonitoredInputStream(InputStream in, int threshold) {
      this.threshold = threshold;

    * Creates a MonitoredInputStream over an underlying input stream.
    * Default threshold is 16KB, small threshold may impact performance impact on larger streams.
    * @param in Underlying input stream, should be non-null because of no public setter
   public MonitoredInputStream(InputStream in) {
      this.threshold = 1024*16;

   public void addChangeListener(ChangeListener l) { if (!listeners.contains(l)) listeners.add(l); }
   public void removeChangeListener(ChangeListener l) { listeners.remove(l); }
   public long getProgress() { return location; }

   protected void triggerChanged( final long location ) {
      if ( threshold > 0 && Math.abs( location-lastTriggeredLocation ) < threshold ) return;
      lastTriggeredLocation = location;
      if (listeners.size() <= 0) return;
      try {
         final ChangeEvent evt = new ChangeEvent(this);
         for (ChangeListener l : listeners) l.stateChanged(evt);
      } catch (ConcurrentModificationException e) {
         triggerChanged(location);  // List changed? Let's re-try.

   @Override public int read() throws IOException {
      final int i =;
      if ( i != -1 ) triggerChanged( location++ );
      return i;

   @Override public int read(byte[] b, int off, int len) throws IOException {
      final int i =, off, len);
      if ( i > 0 ) triggerChanged( location += i );
      return i;

   @Override public long skip(long n) throws IOException {
      final long i = super.skip(n);
      if ( i > 0 ) triggerChanged( location += i );
      return i;

   @Override public void mark(int readlimit) {
      mark = location;

   @Override public void reset() throws IOException {
      if ( location != mark ) triggerChanged( location = mark );

它不知道 - 或关心 - 底层流有多大,所以你需要以其他方式来获取它,例如从文件本身。


try (
   MonitoredInputStream mis = new MonitoredInputStream(new FileInputStream(file), 65536*4) 
) {

   // Setup max progress and listener to monitor read progress
   progressBar.setMaxProgress( (int) file.length() ); // Swing thread or before display please
   mis.addChangeListener( new ChangeListener() { @Override public void stateChanged(ChangeEvent e) {
      SwingUtilities.invokeLater( new Runnable() { @Override public void run() {
         progressBar.setProgress( (int) mis.getProgress() ); // Promise me you WILL use MVC instead of this anonymous class mess! 
   // Start parsing. Listener would call Swing event thread to do the update.
   SAXParserFactory.newInstance().newSAXParser().parse(mis, this);

} catch ( IOException | ParserConfigurationException | SAXException e) {


} finally {

   progressBar.setVisible(false); // Again please call this in swing event thread



希望它有所帮助。如果您发现错误或拼写错误,请随时编辑,或投票给我一些鼓励! :d

答案 1 :(得分:9)


答案 2 :(得分:2)



答案 3 :(得分:1)

假设你知道你有多少文章,你不能只在处理程序中保留一个计数器吗? E.g。

public void startElement (String uri, String localName, 
                          String qName, Attributes attributes) 
                          throws SAXException {


如果您事先不知道文章数量,则需要先计算。然后,您可以打印状态nb tags read/total nb of tags,例如每100个标签(counter % 100 == 0)。



答案 4 :(得分:0)

我会使用输入流位置。创建自己的普通流类,它从“真实”代理/继承,并跟踪读取的字节。正如您所说,获取总文件大小很容易。我不担心缓冲,前瞻等等 - 对于像这样的大型文件它是鸡饲料。另一方面,我将头寸限制在“99%”。