向大型xlsx文件添加行(内存不足)

时间:2017-01-12 18:22:42

标签: java excel apache apache-poi heap-memory

情况如下; 我有一个简单的程序,它使用Apache Poi库在现有的xlsx文件的末尾添加一行数据。见下文

File file = new File(input);
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
XSSFRow row = sheet.createRow(sheet.getLastRowNum() + 1);

在此之后,我将迭代该行并设置CellValues。但问题是在代码的第二行,如上所示,我得到一个内存不足错误。有没有办法将一行数据添加到现有的xlsx文件而不必完全读取文件?

3 个答案:

答案 0 :(得分:1)

您可以尝试XSSF and SAX (Event API)

答案 1 :(得分:1)

如果由于内存不足错误导致XSSFWorkbook失败,并且需要阅读编写工作簿,那么SXSSF和{{1}都不会解析器会有所帮助。这只是为了写作。另一个只是为了阅读。

以下两种方法都需要有关SAX文件格式Office Open XML的知识。通常,*.xlsx文件是*.xlsx存档,其中包含ZIP个文件和特殊目录结构中的其他文件。因此,可以使用XML软件解压缩*.xlsx文件,以查看ZIP文件。文件格式首先由Ecma标准化。因此,对于进一步的回顾,我更喜欢Ecma Markup Language Reference。例如Row

两个示例中使用的XML必须至少有一个工作表,第一个工作表必须至少有一行。

一种方法可能是使用XMLBeansReadAndWriteTest.xlsx方法。我最喜欢的参考是grepcode

示例:

DOM

此代码在import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackagePart; import org.apache.poi.xssf.model.SharedStringsTable; import java.io.File; import java.io.OutputStream; import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell; import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType; import org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId; import org.apache.xmlbeans.XmlOptions; import javax.xml.namespace.QName; import java.util.Map; import java.util.HashMap; import java.util.regex.Pattern; class DOMReadAndWriteTest { public static void main(String[] args) { try { File file = new File("ReadAndWriteTest.xlsx"); //we only open the OPCPackage, we don't create a Workbook OPCPackage opcpackage = OPCPackage.open(file); //if there are strings in the SheetData, we need the SharedStringsTable PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0); SharedStringsTable sharedstringstable = new SharedStringsTable(); sharedstringstable.readFrom(sharedstringstablepart.getInputStream()); //get the PackagePart of the first sheet PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0); //get the worksheet from the first sheet's XML //if it even fails while parsing this, then this approach is not usable WorksheetDocument worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream()); CTWorksheet worksheet = worksheetdocument.getWorksheet(); CTSheetData sheetdata = worksheet.getSheetData(); //put some data in 10 new rows" for (int i = 0; i < 10; i++) { int rowsCount = sheetdata.sizeOfRowArray(); CTCell ctcell= sheetdata.addNewRow().addNewC(); CTRst ctstr = CTRst.Factory.newInstance(); ctstr.setT("new Row " + (rowsCount + 1)); int sRef = sharedstringstable.addEntry(ctstr); ctcell.setT(STCellType.S); ctcell.setV(Integer.toString(sRef)); ctcell=sheetdata.getRowArray(rowsCount).addNewC(); ctcell.setV(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2)); } //write the SharedStringsTable OutputStream out = sharedstringstablepart.getOutputStream(); sharedstringstable.writeTo(out); out.close(); //create XmlOptions for saving the worksheet XmlOptions xmlOptions = new XmlOptions(); xmlOptions.setSaveOuter(); xmlOptions.setUseDefaultNamespace(); xmlOptions.setSaveAggressiveNamespaces(); xmlOptions.setCharacterEncoding("UTF-8"); xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet")); Map<String, String> map = new HashMap<String, String>(); map.put(STRelationshipId.type.getName().getNamespaceURI(), "r"); xmlOptions.setSaveSuggestedPrefixes(map); //save the worksheet out = sheetpart.getOutputStream(); worksheet.save(out, xmlOptions); out.close(); opcpackage.close(); } catch (Exception ex) { ex.printStackTrace(); } } } 的sheet1中写入10个新行,而不打开整个工作簿。但它必须至少打开并解析sheet1和ReadAndWriteTest.xlsx。如果即使失败了,那么这种方法也无法使用。

另一种方法可能是使用StAX。此API可以读取和写入XML事件驱动。它使用流媒体。

示例:

SharedStringsTable

此代码还在import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackagePart; import org.apache.poi.xssf.model.SharedStringsTable; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst; import javax.xml.stream.XMLEventFactory; import javax.xml.stream.XMLEventReader; import javax.xml.stream.XMLEventWriter; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.events.Characters; import javax.xml.stream.events.StartElement; import javax.xml.stream.events.EndElement; import javax.xml.stream.events.Attribute; import javax.xml.stream.events.XMLEvent; import javax.xml.namespace.QName; import java.io.File; import java.io.InputStream; import java.io.OutputStream; import java.util.Arrays; import java.util.List; import java.util.regex.Pattern; class StaxReadAndWriteTest { public static void main(String[] args) { try { File file = new File("ReadAndWriteTest.xlsx"); OPCPackage opcpackage = OPCPackage.open(file); //if there are strings in the sheet data, we need the SharedStringsTable //if it even fails while parsing this SharedStringsTable, then this approach is not usable //then we must stream this XML event driven also. PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0); SharedStringsTable sharedstringstable = new SharedStringsTable(); sharedstringstable.readFrom(sharedstringstablepart.getInputStream()); PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0); XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream()); XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream()); XMLEventFactory eventFactory = XMLEventFactory.newInstance(); int rowsCount = 0; while(reader.hasNext()){ //loop over all XML in sheet1.xml XMLEvent event = (XMLEvent)reader.next(); writer.add(event); //by default write each readed event if(event.isStartElement()){ StartElement startElement = (StartElement)event; QName startElementName = startElement.getName(); if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row boolean rowStart = true; rowsCount++; do { event = (XMLEvent)reader.next(); //find this row's end writer.add(event); //by default write each readed event if(event.isEndElement()){ EndElement endElement = (EndElement)event; QName endElementName = endElement.getName(); if(endElementName.getLocalPart().equalsIgnoreCase("row")) { //end element of row rowStart = false; //we assume that there is nothing else (character data) between end element of row and next element XMLEvent nextElement = (XMLEvent)reader.peek(); QName nextElementName = null; if (nextElement.isStartElement()) nextElementName = ((StartElement)nextElement).getName(); else if (nextElement.isEndElement()) nextElementName = ((EndElement)nextElement).getName(); if(!nextElementName.getLocalPart().equalsIgnoreCase("row")) { //next is not start element of row //we have the last row, so we write new rows now for (int i = 0; i < 10; i++) { StartElement newRowStart = eventFactory.createStartElement(new QName("row"), null, null); writer.add(newRowStart); //start cell A Attribute attribute = eventFactory.createAttribute("t", "s"); List attributeList = Arrays.asList(attribute); StartElement newCellStart = eventFactory.createStartElement(new QName("c"), attributeList.iterator(), null); writer.add(newCellStart); CTRst ctstr = CTRst.Factory.newInstance(); ctstr.setT("new Row " + (rowsCount +1)); int sRef = sharedstringstable.addEntry(ctstr); StartElement newCellValue = eventFactory.createStartElement(new QName("v"), null, null); writer.add(newCellValue); Characters value = eventFactory.createCharacters(Integer.toString(sRef)); writer.add(value); EndElement newCellValueEnd = eventFactory.createEndElement(new QName("v"), null); writer.add(newCellValueEnd); EndElement newCellEnd = eventFactory.createEndElement(new QName("c"), null); writer.add(newCellEnd); //end cell A //start cell B newCellStart = eventFactory.createStartElement(new QName("c"), null, null); writer.add(newCellStart); newCellValue = eventFactory.createStartElement(new QName("v"), null, null); writer.add(newCellValue); value = eventFactory.createCharacters(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2)); writer.add(value); newCellValueEnd = eventFactory.createEndElement(new QName("v"), null); writer.add(newCellValueEnd); newCellEnd = eventFactory.createEndElement(new QName("c"), null); writer.add(newCellEnd); //end cell B EndElement newRowEnd = eventFactory.createEndElement(new QName("row"), null); writer.add(newRowEnd); rowsCount++; } } } } } while (rowStart); } } } writer.flush(); //write the SharedStringsTable OutputStream out = sharedstringstablepart.getOutputStream(); sharedstringstable.writeTo(out); out.close(); opcpackage.close(); } catch (Exception ex) { ex.printStackTrace(); } } } 的sheet1中写入10个新行,而不打开整个工作簿。但它必须至少打开并解析ReadAndWriteTest.xlsx。如果即使这样失败,那么这种方法也是不可用的。但是当然即使SharedStringsTable也可以使用StAX进行流式传输。但正如您在示例中看到的那样生成行和单元格,这要复杂得多。因此,在此示例中,使用SharedStringsTable可以简化操作。

答案 2 :(得分:0)

(没有足够的声誉将此添加为评论) 您是否尝试过使用SXSSFWorkbook而不是XSSFWorkbook?