Question

我需要使用带有限制内存的apache poi解析非常大的excel文件。谷歌搜索后，我发现poi提供了SAX解析器，可以有效地解析大文件而不会占用大量内存。

  private class SheetToCSV implements SheetContentsHandler {
    private boolean firstCellOfRow = false;
    private int currentRow = -1;
    private int currentCol = -1;

    private void outputMissingRows(int number) {
        for (int i=0; i<number; i++) {
            for (int j=0; j<minColumns; j++) {
                output.append(',');
            }
            output.append('\n');
        }
    }

    @Override
    public void startRow(int rowNum) {
        // If there were gaps, output the missing rows
        outputMissingRows(rowNum-currentRow-1);
        // Prepare for this row
        firstCellOfRow = true;
        currentRow = rowNum;
        currentCol = -1;
    }

    @Override
    public void endRow(int rowNum) {
        // Ensure the minimum number of columns
        for (int i=currentCol; i<minColumns; i++) {
            output.append(',');
        }
        output.append('\n');
    }

    @Override
    public void cell(String cellReference, String formattedValue,
            XSSFComment comment) {
        if (firstCellOfRow) {
            firstCellOfRow = false;
        } else {
            output.append(',');
        }

        // gracefully handle missing CellRef here in a similar way as XSSFCell does
        if(cellReference == null) {
            cellReference = new CellAddress(currentRow, currentCol).formatAsString();
        }

        // Did we miss any cells?
        int thisCol = (new CellReference(cellReference)).getCol();
        int missedCols = thisCol - currentCol - 1;
        for (int i=0; i<missedCols; i++) {
            output.append(',');
        }
        currentCol = thisCol;

        // Number or string?
        try {
            Double.parseDouble(formattedValue);
            output.append(formattedValue);
        } catch (NumberFormatException e) {
            output.append('"');
            output.append(formattedValue);
            output.append('"');
        }
    }

    @Override
    public void headerFooter(String text, boolean isHeader, String tagName) {
        // Skip, no headers or footers in CSV
    }
}

在上面链接中提供的示例中，方法'cell'只能访问格式化值，但我需要访问该单元格的实际值。

Answer 1

流媒体接口的当前实现不提供此功能。因此，为了实现这一点，您需要复制基础XSSFSheetXMLHandler的代码并进行调整，以便不格式化单元格内容。

Apache POI SAX Parsing - 如何获取单元格的实际值

1 个答案: