如何使用POI库从Excel中读取经过过滤的行

时间:2019-05-30 06:22:56

标签: java apache-poi

我正在使用Java代码中的POI库读取excel文件。到目前为止还好。但是现在我有一个要求。 Excel文件包含许多记录(例如1000行)。它还具有列标题(第一行)。现在,我正在对其进行excel过滤。假设我有一个“年”列,并且正在过滤year = 2019的所有行。我得到15行。 问题:我只想在Java代码中处理这15行。 poi库中是否有任何方法或方法来确定正在读取的行是否已过滤或(另一种方式,即未过滤)。 谢谢。

我已经有工作代码,但是现在我正在寻找如何仅读取过滤后的行。除了在图书馆和论坛中搜索外,没有其他尝试过的东西。

下面的代码在方法内部。我不习惯使用stackoverflow进行格式化,因此请忽略任何格式化问题。

    // For storing data into CSV files
    StringBuffer data = new StringBuffer();
    try {
        SimpleDateFormat dtFormat = new SimpleDateFormat(CommonConstants.YYYY_MM_DD); // "yyyy-MM-dd"
        String doubleQuotes = "\"";
        FileOutputStream fos = new FileOutputStream(outputFile);
        // Get the workbook object for XLSX file
        XSSFWorkbook wBook = new XSSFWorkbook(new FileInputStream(inputFile));
        wBook.setMissingCellPolicy(Row.RETURN_BLANK_AS_NULL);

        // Get first sheet from the workbook
        //XSSFSheet sheet = wBook.getSheetAt(0);
        XSSFSheet sheet = wBook.getSheet(CommonConstants.METADATA_WORKSHEET);
        //Row row;
        //Cell cell;
        // Iterate through each rows from first sheet
        int rows = sheet.getLastRowNum();
        int totalRows = 0;
        int colTitelNumber = 0;
        Row firstRowRecord = sheet.getRow(1);
        for (int cn = 0; cn < firstRowRecord.getLastCellNum(); cn++) {
            Cell cellObj = firstRowRecord.getCell(cn);
            if(cellObj != null) {
                String str = cellObj.toString();
                if(CommonConstants.COLUMN_TITEL.equalsIgnoreCase(str)) {
                    colTitelNumber = cn;
                    break;
                }
            }
        }
        // Start with row Number 1. We don't need 0th number row as it is for Humans to read but not required for processing.
        for (int rowNumber = 1; rowNumber <= rows; rowNumber++) {
            StringBuffer rowData = new StringBuffer();
            boolean skipRow = false;
            Row rowRecord = sheet.getRow(rowNumber);
            if (rowRecord == null) {
                LOG.error("Empty/Null record found");
            } else {
                for (int cn = 0; cn < rowRecord.getLastCellNum(); cn++) {
                    Cell cellObj = rowRecord.getCell(cn);
                    if(cellObj == null) {
                        if(cn == colTitelNumber) {
                            skipRow = true;
                            break; // The first column cell value is empty/null. Which means Titel column cell doesn't have value so don't add this row in csv.
                        }
                        rowData.append(CommonConstants.CSV_SEPARTOR);
                        continue;
                    }
                    switch (cellObj.getCellType()) {
                        case Cell.CELL_TYPE_BOOLEAN:
                            rowData.append(cellObj.getBooleanCellValue() + CommonConstants.CSV_SEPARTOR);
                            //LOG.error("Boolean:" + cellObj.getBooleanCellValue());
                            break;

                        case Cell.CELL_TYPE_NUMERIC:
                            if (DateUtil.isCellDateFormatted(cellObj)) {
                                Date date = cellObj.getDateCellValue();
                                rowData.append(dtFormat.format(date).toString() + CommonConstants.CSV_SEPARTOR);
                                //LOG.error("Date:" + cellObj.getDateCellValue());
                            } else {
                                rowData.append(cellObj.getNumericCellValue() + CommonConstants.CSV_SEPARTOR);
                                //LOG.error("Numeric:" + cellObj.getNumericCellValue());
                            }
                            break;

                        case Cell.CELL_TYPE_STRING:
                            String cellValue = cellObj.getStringCellValue();
                            // If string contains double quotes then replace it with pair of double quotes.
                            cellValue = cellValue.replaceAll(doubleQuotes, doubleQuotes + doubleQuotes);
                            // If string contains comma then surround the string with double quotes.
                            rowData.append(doubleQuotes + cellValue + doubleQuotes + CommonConstants.CSV_SEPARTOR);
                            //LOG.error("String:" + cellObj.getStringCellValue());
                            break;

                        case Cell.CELL_TYPE_BLANK:
                            rowData.append("" + CommonConstants.CSV_SEPARTOR);
                            //LOG.error("Blank:" + cellObj.toString());
                            break;

                        default:
                            rowData.append(cellObj + CommonConstants.CSV_SEPARTOR);
                    }
                }
                if(!skipRow) {
                    rowData.append("\r\n");
                    data.append(rowData); // Appending one entire row to main data string buffer.
                    totalRows++;
                }
            }
        }
        pTransferObj.put(CommonConstants.TOTAL_ROWS, (totalRows));
        fos.write(data.toString().getBytes());
        fos.close();
        wBook.close();
    } catch (Exception ex) {
        LOG.error("Exception Caught while generating CSV file", ex);
    }

3 个答案:

答案 0 :(得分:1)

在工作表中不可见的所有行的高度均为零。因此,如果仅需要读取可见行,则可以通过Row.getZeroHeight进行检查。

示例

表格:

enter image description here

代码:

import java.io.FileInputStream;

import org.apache.poi.ss.usermodel.*;

class ReadExcelOnlyVisibleRows {

 public static void main(String[] args) throws Exception {

  Workbook workbook  = WorkbookFactory.create(new FileInputStream("SAMPLE.xlsx"));

  DataFormatter dataFormatter = new DataFormatter();

  CreationHelper creationHelper = workbook.getCreationHelper();

  FormulaEvaluator formulaEvaluator = creationHelper.createFormulaEvaluator();

  Sheet sheet = workbook.getSheetAt(0);

  for (Row row : sheet) {
   if (!row.getZeroHeight()) { // if row.getZeroHeight() is true then this row is not visible
    for (Cell cell : row) {
     String cellContent = dataFormatter.formatCellValue(cell, formulaEvaluator);
     System.out.print(cellContent + "\t");
    }
    System.out.println();
   }
  }

  workbook.close();

 }
}

结果:

F1    F2    F3      F4  
V2    2     2-Mai   FALSE   
V4    4     4-Mai   FALSE   
V2    6     6-Mai   FALSE   
V4    8     8-Mai   FALSE   

答案 1 :(得分:0)

您必须使用Apache Poi库中提供的自动过滤器,并且还设置了冻结。我在下面提供了简短的代码段,您可以相应地使用。

XSSFSheet sheet = wBook.getSheet(CommonConstants.METADATA_WORKSHEET);
sheet.setAutoFilter(new CellRangeAddress(0, 0, 0, numColumns));
sheet.createFreezePane(0, 1);

答案 2 :(得分:0)

我不得不重写一些钩子,并想出自己的方法来合并对隐藏行的过滤,以防止对其进行处理。下面是代码片段。我的方法包括打开同一工作表的第二个副本,以便我可以查询正在处理的当前行以查看其是否被隐藏。上面的答案涉及到这一点,下面的内容对此进行了扩展,以显示如何将其很好地合并到Spring批处理excel框架中。一个缺点是您必须打开同一文件的第二个副本,但是我无法找出一种方法(也许没有!)来尝试使用内部工作簿工作表,其中还有其他原因,因为org.springframework.batch.item.excel.poi.PoiSheet是包私有的(请注意,以下语法是Groovy !!! ):

  /**
   * Produces a reader that knows how to ingest a file in excel format.
   */
  private PoiItemReader<String[]> createExcelReader(String filePath) {
    File f = new File(filePath)
    PoiItemReader<String[]> reader = new PoiItemReader<>()
    reader.setRowMapper(new PassThroughRowMapper())
    Resource resource = new DefaultResourceLoader().getResource("file:" + f.canonicalPath)
    reader.setResource(resource)
    reader.setRowSetFactory(new VisibleRowsOnlyRowSetFactory(resource))
    reader.open(new ExecutionContext())
    reader
  }

...

// The "hooks" I overwrote to inject my logic

  static class VisibleRowsOnlyRowSet extends DefaultRowSet {
    Workbook workbook
    Sheet sheet

    VisibleRowsOnlyRowSet(final Sheet sheet, final RowSetMetaData metaData) {
      super(sheet, metaData)
    }

    VisibleRowsOnlyRowSet(final Sheet sheet, final RowSetMetaData metaData, Workbook workbook) {
      this(sheet, metaData)
      this.workbook = workbook
      this.sheet = sheet
    }

    boolean next() {
      boolean moreLeft = super.next()
      if (moreLeft) {
        Row row = workbook.getSheet(sheet.name).getRow(getCurrentRowIndex())
        if (row?.getZeroHeight()) {
          log.warn("Row $currentRow is hidden in input excel sheet, will omit it from output.")
          currentRow.eachWithIndex { _, int i ->
            currentRow[i] = ''
          }
        }
      }
      moreLeft
    }
  }

  static class VisibleRowsOnlyRowSetFactory extends DefaultRowSetFactory {
    Workbook workbook

    VisibleRowsOnlyRowSetFactory(Resource resource) {
      this.workbook = WorkbookFactory.create(resource.inputStream)
    }

    RowSet create(Sheet sheet) {
      new VisibleRowsOnlyRowSet(sheet, super.create(sheet).metaData, workbook)
    }
  }