存在内存问题时如何修改大型Excel文件

时间:2019-06-24 14:58:32

标签: java excel xml apache-poi

如标题所示,我有一个很大的Excel文件(> 200张纸)需要添加数据。我不想创建新的单元格,只想修改现有的单元格。

我尝试使用Apache Poi,但是即使将Xms和Xmx设置为8g,我的应用程序也会用尽内存。低内存写入的唯一选择似乎是使用SXSSF。问题在于它仅适用于创建新单元,而不允许修改现有单元。我还尝试使用事件API来处理工作表的XML,但它似乎仅适用于读取操作。我一直在尝试使用XMLEventWriter,但是我找不到一种方法来访问可用于编写的工作表的XML数据。除了使用XSSFReader之外,是否可以访问Excel文件的XML数据?

1 个答案:

答案 0 :(得分:1)

正如上面的评论所述,没有一种适合使用纯XML读写Office Open XML电子表格的解决方案。每个Excel工作簿都需要自己的代码,具体取决于其结构以及应更改的内容。

这是因为apache poi的高级类提供了元级别来避免这种情况。但这需要记忆才能起作用。对于非常大的工作簿,它需要很多内存。为了避免通过直接操纵XML来消耗内存,该元级别不可用。因此,必须了解工作表的XML结构和所使用的XML元素的含义。

因此,如果我们有一个Excel工作簿,其第一张工作表的第A列中有字符串,而在B列中有数字,则可以使用StAX来更改每第五行使用以下代码直接操作XML

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;

import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;

import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

import javax.xml.namespace.QName;

import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;

import java.util.regex.Pattern;

class StaxReadAndChangeTest {

 public static void main(String[] args) throws Exception {
  File file = new File("ReadAndWriteTest.xlsx");
  OPCPackage opcpackage = OPCPackage.open(file);

  //since there are strings in the sheet data, we need the SharedStringsTable
  PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
  SharedStringsTable sharedstringstable = new SharedStringsTable();
  sharedstringstable.readFrom(sharedstringstablepart.getInputStream());

  //get first worksheet
  PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);

  //get XML reader and writer            
  XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
  XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());

  XMLEventFactory eventFactory = XMLEventFactory.newInstance();

  int rowsCount = 0;
  int colsCount = 0;
  boolean cellAfound = false;
  boolean cellBfound = false;

  while(reader.hasNext()){ //loop over all XML in sheet1.xml
   XMLEvent event = (XMLEvent)reader.next();
   if(event.isStartElement()) {
    StartElement startElement = (StartElement)event;
    QName startElementName = startElement.getName();
    if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
     rowsCount++;
     colsCount = 0;
    } else if (startElementName.getLocalPart().equalsIgnoreCase("c")) { //start element of cell
     colsCount++;
     cellAfound = false;
     cellBfound = false;
     if (rowsCount % 5 == 0) { // every 5th row
      if (colsCount == 1) { // cell A
       cellAfound = true;
      } else if (colsCount == 2) { // cell B
       cellBfound = true;
      } 
     }
    } else if (startElementName.getLocalPart().equalsIgnoreCase("v")) { //start element of value
     if (cellAfound) {
      // create new rich text content for cell A
      CTRst ctstr = CTRst.Factory.newInstance();
      ctstr.setT("changed String Value A" + (rowsCount));
      //int sRef = sharedstringstable.addEntry(ctstr);
      int sRef = sharedstringstable.addSharedStringItem(new XSSFRichTextString(ctstr));
      // set the new characters for A's value in the XML
      if (reader.hasNext()) {
       writer.add(event); // write the old event
       event = (XMLEvent)reader.next(); // get next event - should be characters
       if (event.isCharacters()) {
        Characters value = eventFactory.createCharacters(Integer.toString(sRef));
        event = value;
       } 
      }        
     } else if (cellBfound) {
      // set the new characters for B's value in the XML
      if (reader.hasNext()) {
       writer.add(event); // write the old event
       event = (XMLEvent)reader.next(); // get next event - should be characters
       if(event.isCharacters()) { 
        double oldValue = Double.valueOf(((Characters)event).getData()); // old double value
        Characters value = eventFactory.createCharacters(Double.toString(oldValue * rowsCount));
        event = value;         
       }         
      }
     }
    }
   }
   writer.add(event); //by default write each read event
  }
  writer.flush();

  //write the SharedStringsTable
  OutputStream out = sharedstringstablepart.getOutputStream();
  sharedstringstable.writeTo(out);
  out.close();
  opcpackage.close();

 }
}

apache poi的{​​{1}}类相比,这将大大减少内存消耗。但是,如上所述,它仅完全适用于这种XSSF工作簿,该工作簿的第一张工作表的Excel列中包含字符串,而A列中包含数字。