我试图通过Apache POI读取大型excel文件xlsx,比如40-50 MB。我失去了内存异常。当前堆内存为3GB。
我可以毫无问题地阅读较小的excel文件。我需要一种方法来读取大型excel文件,然后通过Spring excel视图将它们作为响应返回。
public class FetchExcel extends AbstractView {
@Override
protected void renderMergedOutputModel(
Map model, HttpServletRequest request, HttpServletResponse response)
throws Exception {
String fileName = "SomeExcel.xlsx";
response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(pkg);
ServletOutputStream respOut = response.getOutputStream();
pkg.close();
workbook.write(respOut);
respOut.flush();
workbook = null;
response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");
}
}
我首先开始使用XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in);
但是每个Apache POI API的成本很高,所以我改用OPC封装方式,但效果仍然相同。我不需要解析或处理文件,只需阅读并返回即可。
答案 0 :(得分:14)
以下是使用sax解析器读取大型xls文件的示例。
public void parseExcel(File file) throws IOException {
OPCPackage container;
try {
container = OPCPackage.open(file.getAbsolutePath());
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
XSSFReader xssfReader = new XSSFReader(container);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
while (iter.hasNext()) {
InputStream stream = iter.next();
processSheet(styles, strings, stream);
stream.close();
}
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
}
}
protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {
@Override
public void startRow(int rowNum) {
}
@Override
public void endRow() {
}
@Override
public void cell(String cellReference, String formattedValue) {
}
@Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
},
false//means result instead of formula
);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
答案 1 :(得分:6)
您没有提及是否需要修改电子表格。
这可能是显而易见的,但如果您不需要修改电子表格,那么您不需要解析它并将其写回,您只需从文件中读取字节,然后写出字节,如你会想,说一个图像,或任何其他二进制格式。
如果您确实需要在将电子表格发送给用户之前对其进行修改,那么据我所知,您可能需要采取不同的方法。
我知道用Java读取Excel文件的每个库都会将整个电子表格读入内存,因此您必须为每个可能同时处理的电子表格提供50MB的内存。正如其他人所指出的,这涉及调整VM可用的堆。
如果您需要同时处理大量电子表格,并且无法分配足够的内存,请考虑使用可以流式传输的格式,而不是一次性读取到内存中。可以通过Excel打开CSV格式,过去通过将content-type设置为application / vnd.ms-excel,将附件文件名设置为以“.xls”结尾的内容,但实际上返回CSV,我已经取得了很好的效果内容。我没有在几年内尝试过,所以YMMV。
答案 2 :(得分:2)
在bellwo示例中,我将添加完整的代码,以将完整的excel文件(对我来说是60Mo)解析为对象列表,而不会出现“ 内存不足”的问题,并且可以正常工作:
import java.util.ArrayList;
import java.util.List;
class DistinctByProperty {
private static OPCPackage xlsxPackage = null;
private static PrintStream output= System.out;
private static List<MassUpdateMonitoringRow> resultMapping = new ArrayList<>();
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\aberguig032018\\Downloads\\your_excel.xlsx");
double bytes = file.length();
double kilobytes = (bytes / 1024);
double megabytes = (kilobytes / 1024);
System.out.println("Size "+megabytes);
parseExcel(file);
}
public static void parseExcel(File file) throws IOException {
try {
xlsxPackage = OPCPackage.open(file.getAbsolutePath(), PackageAccess.READ);
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(xlsxPackage);
XSSFReader xssfReader = new XSSFReader(xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
int index = 0;
while (iter.hasNext()) {
try (InputStream stream = iter.next()) {
String sheetName = iter.getSheetName();
output.println();
output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, new MappingFromXml(resultMapping), stream);
}
++index;
}
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
}
private static void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, MappingFromXml mappingFromXml, InputStream sheetInputStream) throws IOException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, mappingFromXml, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
System.out.println("Size of Array "+resultMapping.size());
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
}
您必须添加实现
的calssSheetContentsHandler
import com.sun.org.apache.xpath.internal.operations.Bool;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.usermodel.XSSFComment;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.List;
public class MappingFromXml implements SheetContentsHandler {
private List<myObject> result = new ArrayList<>();
private myObject myObject = null;
private int lineNumber = 0;
/**
* Number of columns to read starting with leftmost
*/
private int minColumns = 25;
/**
* Destination for data
*/
private PrintStream output = System.out;
public MappingFromXml(List<myObject> list) {
this.result = list;
}
@Override
public void startRow(int i) {
output.println("iii " + i);
lineNumber = i;
myObject = new myObject();
}
@Override
public void endRow(int i) {
output.println("jjj " + i);
result.add(myObject);
myObject = null;
}
@Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
int columnIndex = (new CellReference(cellReference)).getCol();
if(lineNumber > 0){
switch (columnIndex) {
case 0: {//Tech id
if (formattedValue != null && !formattedValue.isEmpty())
myObject.setId(Integer.parseInt(formattedValue));
}
break;
//TODO add other cell
}
}
}
@Override
public void headerFooter(String s, boolean b, String s1) {
}
}
有关更多信息,请访问此link
答案 3 :(得分:0)
在解析xlsx文件时我也遇到了同样的OOM问题......经过两天的挣扎,我终于找到了下面非常完美的代码;
此代码基于sjxlsx。它读取xlsx并存储在HSSF表中。
[code=java]
// read the xlsx file
SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));
HSSFWorkbook hsfWorkbook = new HSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();
Sheet sheetToRead = workbook.getSheet(0, false);
SheetRowReader reader = sheetToRead.newReader();
Cell[] row;
int rowPos = 0;
while ((row = reader.readRow()) != null) {
org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
int cellPos = 0;
for (Cell cell : row) {
if(cell != null){
org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
hfsCell.setCellValue(cell.getValue());
}
cellPos++;
}
rowPos++;
}
return hsfSheet;[/code]