这是内联w / Writing a large ResultSet to a File,但有问题的文件是Excel文件。
我正在使用Apache POI库编写一个Excel文件,其中包含从ResultSet对象检索到的大型数据集。数据范围从几千条记录到大约一百万条;不确定这将如何转换为Excel格式的文件系统字节。
以下是我编写的测试代码,用于检查编写如此大的结果集所花费的时间以及性能影响w.r.t CPU&存储器中。
protected void writeResultsetToExcelFile(ResultSet rs, int numSheets, String fileNameAndPath) throws Exception {
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileNameAndPath));
int numColumns = rs.getMetaData().getColumnCount();
Workbook wb = ExcelFileUtil.createExcelWorkBook(true, numSheets);
Row heading = wb.getSheetAt(0).createRow(1);
ResultSetMetaData rsmd = rs.getMetaData();
for(int x = 0; x < numColumns; x++) {
Cell cell = heading.createCell(x+1);
cell.setCellValue(rsmd.getColumnLabel(x+1));
}
int rowNumber = 2;
int sheetNumber = 0;
while(rs.next()) {
if(rowNumber == 65001) {
log("Sheet " + sheetNumber + "written; moving onto to sheet " + (sheetNumber + 1));
sheetNumber++;
rowNumber = 2;
}
Row row = wb.getSheetAt(sheetNumber).createRow(rowNumber);
for(int y = 0; y < numColumns; y++) {
row.createCell(y+1).setCellValue(rs.getString(y+1));
wb.write(bos);
}
rowNumber++;
}
//wb.write(bos);
bos.close();
}
上面的代码没多少运气。创建的文件似乎快速增长(每秒约70Mb)。所以我在大约10分钟后停止了执行(当文件达到7Gb时杀死了JVM)并试图在Excel 2007中打开文件。当我打开它时,文件大小变为8k(!)并且只有标题和第一个行已创建。不知道我在这里缺少什么。
有什么想法吗?
答案 0 :(得分:44)
使用SXSSF poi 3.8
package example;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class SXSSFexample {
public static void main(String[] args) throws Throwable {
FileInputStream inputStream = new FileInputStream("mytemplate.xlsx");
XSSFWorkbook wb_template = new XSSFWorkbook(inputStream);
inputStream.close();
SXSSFWorkbook wb = new SXSSFWorkbook(wb_template);
wb.setCompressTempFiles(true);
SXSSFSheet sh = (SXSSFSheet) wb.getSheetAt(0);
sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
for(int rownum = 4; rownum < 100000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
FileOutputStream out = new FileOutputStream("tempsxssf.xlsx");
wb.write(out);
out.close();
}
}
需要:
答案 1 :(得分:6)
喔。我认为你正在编写工作簿944,000次。你的wb.write(bos)调用是在内部循环中。我不确定这是否与Workbook类的语义完全一致?从我在该类的Javadocs中可以看出,该方法将整个工作簿写出到指定的输出流。随着事情的发展,它会写出你目前为每一行添加的每一行。
这解释了为什么你也看到了正好一行。将要写入文件的第一个工作簿(有一行)显示所有内容 - 然后是7GB的垃圾。
答案 2 :(得分:3)
除非您必须编写公式或格式,否则应考虑写出.csv文件。无限简单,无限快,Excel将根据定义自动,正确地转换为.xls或.xlsx。
答案 3 :(得分:2)
您可以使用工作簿的 SXSSFWorkbook 实现,如果您在Excel中使用样式,则>>100000 loops, best of 3: 19.1 µs per loop
>>The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached
>>100000 loops, best of 3: 2.4 µs per loop
可以缓存样式改善你的表现。
答案 4 :(得分:0)
现在我接受了@Gian的建议&amp;将每个工作簿的记录数限制为500k,并将其余工作簿滚动到下一个工作簿。似乎工作得体。对于上面的配置,每个工作簿花了大约10分钟。
答案 5 :(得分:0)
我更新了BigGridDemo以支持多张表格。
BigExcelWriterImpl.java
package com.gdais.common.apache.poi.bigexcelwriter;
import static com.google.common.base.Preconditions.*;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;
import org.apache.commons.io.FilenameUtils;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import com.google.common.base.Function;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.Iterables;
public class BigExcelWriterImpl implements BigExcelWriter {
private static final String XML_ENCODING = "UTF-8";
@Nonnull
private final File outputFile;
@Nullable
private final File tempFileOutputDir;
@Nullable
private File templateFile = null;
@Nullable
private XSSFWorkbook workbook = null;
@Nonnull
private LinkedHashMap<String, XSSFSheet> addedSheets = new LinkedHashMap<String, XSSFSheet>();
@Nonnull
private Map<XSSFSheet, File> sheetTempFiles = new HashMap<XSSFSheet, File>();
BigExcelWriterImpl(@Nonnull File outputFile) {
this.outputFile = outputFile;
this.tempFileOutputDir = outputFile.getParentFile();
}
@Override
public BigExcelWriter createWorkbook() {
workbook = new XSSFWorkbook();
return this;
}
@Override
public BigExcelWriter addSheets(String... sheetNames) {
checkState(workbook != null, "workbook must be created before adding sheets");
for (String sheetName : sheetNames) {
XSSFSheet sheet = workbook.createSheet(sheetName);
addedSheets.put(sheetName, sheet);
}
return this;
}
@Override
public BigExcelWriter writeWorkbookTemplate() throws IOException {
checkState(workbook != null, "workbook must be created before writing template");
checkState(templateFile == null, "template file already written");
templateFile = File.createTempFile(FilenameUtils.removeExtension(outputFile.getName())
+ "-template", ".xlsx", tempFileOutputDir);
System.out.println(templateFile);
FileOutputStream os = new FileOutputStream(templateFile);
workbook.write(os);
os.close();
return this;
}
@Override
public SpreadsheetWriter createSpreadsheetWriter(String sheetName) throws IOException {
if (!addedSheets.containsKey(sheetName)) {
addSheets(sheetName);
}
return createSpreadsheetWriter(addedSheets.get(sheetName));
}
@Override
public SpreadsheetWriter createSpreadsheetWriter(XSSFSheet sheet) throws IOException {
checkState(!sheetTempFiles.containsKey(sheet), "writer already created for this sheet");
File tempSheetFile = File.createTempFile(
FilenameUtils.removeExtension(outputFile.getName())
+ "-sheet" + sheet.getSheetName(), ".xml", tempFileOutputDir);
Writer out = null;
try {
out = new OutputStreamWriter(new FileOutputStream(tempSheetFile), XML_ENCODING);
SpreadsheetWriter sw = new SpreadsheetWriterImpl(out);
sheetTempFiles.put(sheet, tempSheetFile);
return sw;
} catch (RuntimeException e) {
if (out != null) {
out.close();
}
throw e;
}
}
private static Function<XSSFSheet, String> getSheetName = new Function<XSSFSheet, String>() {
@Override
public String apply(XSSFSheet sheet) {
return sheet.getPackagePart().getPartName().getName().substring(1);
}
};
@Override
public File completeWorkbook() throws IOException {
FileOutputStream out = null;
try {
out = new FileOutputStream(outputFile);
ZipOutputStream zos = new ZipOutputStream(out);
Iterable<String> sheetEntries = Iterables.transform(sheetTempFiles.keySet(),
getSheetName);
System.out.println("Sheet Entries: " + sheetEntries);
copyTemplateMinusEntries(templateFile, zos, sheetEntries);
for (Map.Entry<XSSFSheet, File> entry : sheetTempFiles.entrySet()) {
XSSFSheet sheet = entry.getKey();
substituteSheet(entry.getValue(), getSheetName.apply(sheet), zos);
}
zos.close();
out.close();
return outputFile;
} finally {
if (out != null) {
out.close();
}
}
}
private static void copyTemplateMinusEntries(File templateFile,
ZipOutputStream zos, Iterable<String> entries) throws IOException {
ZipFile templateZip = new ZipFile(templateFile);
@SuppressWarnings("unchecked")
Enumeration<ZipEntry> en = (Enumeration<ZipEntry>) templateZip.entries();
while (en.hasMoreElements()) {
ZipEntry ze = en.nextElement();
if (!Iterables.contains(entries, ze.getName())) {
System.out.println("Adding template entry: " + ze.getName());
zos.putNextEntry(new ZipEntry(ze.getName()));
InputStream is = templateZip.getInputStream(ze);
copyStream(is, zos);
is.close();
}
}
}
private static void substituteSheet(File tmpfile, String entry,
ZipOutputStream zos)
throws IOException {
System.out.println("Adding sheet entry: " + entry);
zos.putNextEntry(new ZipEntry(entry));
InputStream is = new FileInputStream(tmpfile);
copyStream(is, zos);
is.close();
}
private static void copyStream(InputStream in, OutputStream out) throws IOException {
byte[] chunk = new byte[1024];
int count;
while ((count = in.read(chunk)) >= 0) {
out.write(chunk, 0, count);
}
}
@Override
public Workbook getWorkbook() {
return workbook;
}
@Override
public ImmutableList<XSSFSheet> getSheets() {
return ImmutableList.copyOf(addedSheets.values());
}
}
SpreadsheetWriterImpl.java
package com.gdais.common.apache.poi.bigexcelwriter;
import java.io.IOException;
import java.io.Writer;
import java.util.Calendar;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.util.CellReference;
class SpreadsheetWriterImpl implements SpreadsheetWriter {
private static final String XML_ENCODING = "UTF-8";
private final Writer _out;
private int _rownum;
SpreadsheetWriterImpl(Writer out) {
_out = out;
}
@Override
public SpreadsheetWriter closeFile() throws IOException {
_out.close();
return this;
}
@Override
public SpreadsheetWriter beginSheet() throws IOException {
_out.write("<?xml version=\"1.0\" encoding=\""
+ XML_ENCODING
+ "\"?>"
+
"<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\">");
_out.write("<sheetData>\n");
return this;
}
@Override
public SpreadsheetWriter endSheet() throws IOException {
_out.write("</sheetData>");
_out.write("</worksheet>");
closeFile();
return this;
}
/**
* Insert a new row
*
* @param rownum
* 0-based row number
*/
@Override
public SpreadsheetWriter insertRow(int rownum) throws IOException {
_out.write("<row r=\"" + (rownum + 1) + "\">\n");
this._rownum = rownum;
return this;
}
/**
* Insert row end marker
*/
@Override
public SpreadsheetWriter endRow() throws IOException {
_out.write("</row>\n");
return this;
}
@Override
public SpreadsheetWriter createCell(int columnIndex, String value, int styleIndex)
throws IOException {
String ref = new CellReference(_rownum, columnIndex).formatAsString();
_out.write("<c r=\"" + ref + "\" t=\"inlineStr\"");
if (styleIndex != -1) {
_out.write(" s=\"" + styleIndex + "\"");
}
_out.write(">");
_out.write("<is><t>" + value + "</t></is>");
_out.write("</c>");
return this;
}
@Override
public SpreadsheetWriter createCell(int columnIndex, String value) throws IOException {
createCell(columnIndex, value, -1);
return this;
}
@Override
public SpreadsheetWriter createCell(int columnIndex, double value, int styleIndex)
throws IOException {
String ref = new CellReference(_rownum, columnIndex).formatAsString();
_out.write("<c r=\"" + ref + "\" t=\"n\"");
if (styleIndex != -1) {
_out.write(" s=\"" + styleIndex + "\"");
}
_out.write(">");
_out.write("<v>" + value + "</v>");
_out.write("</c>");
return this;
}
@Override
public SpreadsheetWriter createCell(int columnIndex, double value) throws IOException {
createCell(columnIndex, value, -1);
return this;
}
@Override
public SpreadsheetWriter createCell(int columnIndex, Calendar value, int styleIndex)
throws IOException {
createCell(columnIndex, DateUtil.getExcelDate(value, false), styleIndex);
return this;
}
@Override
public SpreadsheetWriter createCell(int columnIndex, Calendar value)
throws IOException {
createCell(columnIndex, value, -1);
return this;
}
}
答案 6 :(得分:0)
您可以按照以下步骤提高excel导出的性能:
1)从数据库中获取数据时,请避免将结果集强制转换为实体类列表。而是直接将其分配给列表
List<Object[]> resultList =session.createSQLQuery("SELECT t1.employee_name, t1.employee_id ... from t_employee t1 ").list();
代替
List<Employee> employeeList =session.createSQLQuery("SELECT t1.employee_name, t1.employee_id ... from t_employee t1 ").list();
2)当数据不为空时,使用 SXSSFWorkbook 而不是XSSFWorkbook创建excel工作簿对象,并使用 SXSSFRow 创建新行。
3)使用java.util.Iterator迭代数据列表。
迭代器itr = resultList.iterator();
4)使用column ++将数据写入excel。
int rowCount = 0;
int column = 0;
while(itr.hasNext()){
SXSSFRow row = xssfSheet.createRow(rowCount++);
Object[] object = (Object[]) itr.next();
//column 1
row.setCellValue(object[column++]); // write logic to create cell with required style in setCellValue method
//column 2
row.setCellValue(object[column++]);
itr.remove();
}
5)在迭代列表时,将数据写入excel工作表,并使用remove方法从列表中删除行。这是为了避免从列表中保留不需要的数据并清除Java堆大小。
itr.remove();