Apache Poi SAX写入空单元格

时间:2016-02-17 11:47:24

标签: spring spring-mvc apache-poi

 public void processOneSheet(String filename) throws Exception {
    OPCPackage pkg = OPCPackage.open(filename);
    XSSFReader r = new XSSFReader( pkg );

    SharedStringsTable sst = r.getSharedStringsTable();

    XMLReader parser = fetchSheetParser(sst);

    // To look up the Sheet Name / Sheet Order / rID,
    //  you need to process the core Workbook stream.
    // Normally it's of the form rId# or rSheet#
    InputStream sheet2 = r.getSheet("rId2");
    InputSource sheetSource = new InputSource(sheet2);
    parser.parse(sheetSource);
    sheet2.close();
    }

    public void processAllSheets(String filename) throws Exception {
    OPCPackage pkg = OPCPackage.open(filename);
    XSSFReader r = new XSSFReader( pkg );

    SharedStringsTable sst = r.getSharedStringsTable();

    XMLReader parser = fetchSheetParser(sst);

    Iterator<InputStream> sheets = r.getSheetsData();
    while(sheets.hasNext()) {
        System.out.println("Processing new sheet:\n");
        InputStream sheet = sheets.next();
        InputSource sheetSource = new InputSource(sheet);
        parser.parse(sheetSource);
        sheet.close();
        System.out.println("end Processing");
    }
}

public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
    XMLReader parser =
        XMLReaderFactory.createXMLReader(
                "org.apache.xerces.parsers.SAXParser"
        );
    ContentHandler handler = new SheetHandler(sst);
    parser.setContentHandler(handler);
    return parser;
}

   /** 
     * See org.xml.sax.helpers.DefaultHandler javadocs 
     */
   private static class SheetHandler extends DefaultHandler {
    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;

    private SheetHandler(SharedStringsTable sst) {
        this.sst = sst;
    }

    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        // c => cell
        if(name.equals("c")) {
            // Print the cell reference

            // Figure out if the value is an index in the SST
            String cellType = attributes.getValue("t");
            if(cellType != null && cellType.equals("s")) {
                nextIsString = true;
            } else {
                nextIsString = false;
            }
        }
        // Clear contents cache
        lastContents = "";
    }

    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once
        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);
            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
        }` here

我的文件具有以下结构:

 A       B       C       D

1个文本文本文本

2个文本文本文本

我正在阅读excel文件,而不是添加一些数据更改并将其作为输出。 但有时文本可能是空的,问题是在编写excel文件时,它没有考虑空单元格,而是用下一个单元格的内容取出。 请问我该如何处理?

2 个答案:

答案 0 :(得分:2)

Excel文件“稀疏地”写入,不包括不用于减少空间的单元格。当您正在阅读文件时,您需要考虑到这个

如果你正在使用简单(但内存饥渴)UserModel,你可以使用a MissingCellPolicy之类的东西来控制如何处理这些丢失的单元格

如果您希望进入低级别并使用SAX事件处理它,就像您所看到的那样,那么您需要自己处理。只需抓住每个单元格经过的参考,跟踪你看到的最后一个单元格,并在那时处理任何缺失的单元格

Apache POI has a great example of doing this, in the SAX-powered XLSX to CSV example converter。你需要在很大程度上遵循相同的逻辑,例如

private class SheetToCSV implements SheetContentsHandler {
    private boolean firstCellOfRow = false;
    private int currentCol = -1;

    public void cell(String cellReference, String formattedValue,
            XSSFComment comment) {
        if (firstCellOfRow) {
            firstCellOfRow = false;
        } else {
            output.append(',');
        }

        // Did we miss any cells?
        int thisCol = (new CellReference(cellReference)).getCol();
        int missedCols = thisCol - currentCol - 1;
        for (int i=0; i<missedCols; i++) {
            output.append(',');
        }
        currentCol = thisCol;

        // .... Rest of cell contents handling method goes here ....

答案 1 :(得分:0)

对于基于SAX事件的xlsx解析器,我也遇到了相同的情况,我只是扩展了Gagravarr的方法以适合我的要求,即在此模型中,您需要跟踪最后处理的列,因此按照您的要求在当前列之前处理跳过的列希望&您知道这些单元格是空白的。

@Override
    public void cell(String cellReference, String formattedValue) {
    int currentColumnIndex = (new CellReference(cellReference)).getCol();
    for (int columnIndex = previousCol + 1; columnIndex <= currentColumnIndex; ++columnIndex) {
        if (columnIndex != currentColumnIndex) {
            setValueInRowObject(columnIndex, "");
        } else {
            setValueInRowObject(columnIndex, formattedValue);
        }
    }
    previousCol = currentColumnIndex;
    }

其中previousCol是实例字段,初始化为-1并在调用-startRow方法

时重置
    @Override
        public void startRow(int rowNum) {
        previousCol = -1;
}

我基本上有一定的金额和日期列,如果这些列带有无效值并且空白不是有效的金额或日期,则需要填充警告消息。但是由于cell方法未针对空单元格被调用,因此我无法为该特定单元格填充该警告消息。