将XLSX导出为CSV的最快方法

时间:2016-07-24 02:02:54

标签: java excel apache-poi

我刚尝试使用SAX事件API使用XSSF XLSX2CSV示例,使用OpenCSV将630k行乘5列电子表格导出为CSV进行写入。 至少需要70秒(虽然我最初在分析Web服务器时看到20分钟)才能完成操作,而Excel在不到10秒的时间内完成操作。

部分问题是org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler界面如下所示:

 /**
  * You need to implement this to handle the results
  *  of the sheet parsing.
  */
 public interface SheetContentsHandler {
    /** A row with the (zero based) row number has started */
    public void startRow(int rowNum);
    /** A row with the (zero based) row number has ended */
    public void endRow(int rowNum);
    /**
     * A cell, with the given formatted value (may be null),
     *  and possibly a comment (may be null), was encountered */
    public void cell(String cellReference, String formattedValue, XSSFComment comment);
    /** A header or footer has been encountered */
    public void headerFooter(String text, boolean isHeader, String tagName);
 }

请注意,您一次只有一个Cell,而不是整行。 我的解决方案是使用列标题作为键将单元格放入映射中,使用endRow写入行。

@Override
public void endRow(int rowNum) {
    if(currentRow == HEADER_ROW) {
        processRow(currentRow, columnHeaders);
    } else {
        processRow(currentRow, currentRowMap);
    }
}

private void processRow(int currentRow, LinkedHashMap<String, String> map) {
    String[] nextLine = map.values().toArray(new String[map.size()]);
    csvWriter.writeNext(nextLine);
}

/**
 * POI will not invoke this method if the cell is blank or if it detects there's no more data in the row.
 * Therefore, this is not necessarily invoked the same number of times each row.
 * The startRow method has initialised the currentRowMap to work around this.
 */
@Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
    if(currentRow == HEADER_ROW) {
        columnHeaders.put(getColumnReference(cellReference), formattedValue);
    } else {
        String columnHeader = columnHeaders.get(getColumnReference(cellReference));
        currentRowMap.put(columnHeader, formattedValue);
    }
}

/**
 * Returns the alphabetic column reference from this cell reference. Example: Given 'A12' returns
 * 'A' or given 'BA205' returns 'BA'
 */
private static String getColumnReference(String cellReference) {

    if (StringUtils.isBlank(cellReference)) {
        return "";
    }

    return cellReference.split("[0-9]*$")[0];
}

添加和读取此地图并获取每个单元格的列参考称为3M时间,效率非常低。

将XLSX导出为CSV有哪些更快的选项?

1 个答案:

答案 0 :(得分:-2)

Perl(Spreadsheet :: ParseExcel)并没有花那么长时间。