我想用^ Caret Delimited Symbol将xlsx文件转换为CSV文件

时间:2017-09-18 19:31:47

标签: java apache-poi

我有一个包含50k +行和400多列的巨大Excel文件。我正在尝试编写java代码以导出到CSV文件,但它无法正常工作(给出错误堆和堆栈)。

然后我使用宏将excel文件拆分为5k行,然后成功生成CSV文件,但某些数据不会出现在csv文件中。我已在数据>中使用Excel应用程序进行了验证从文本获取外部数据到csv。我们必须传递我从过滤器应用的所有信息,然后它显示空白选项。有些行没有在CSV文件中获取数据。

import java.io.*;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

class ExcelToCSV {

    static List<StringBuilder> dataList = new ArrayList();

static void convertXlsxToCSVF(File inputFile) 
{
        // For storing data into CSV files

try 
{
        // Get the workbook instance for XLSX file
        XSSFWorkbook wb = new XSSFWorkbook(inputFile.getAbsolutePath());

        // Get first sheet from the workbook
        XSSFSheet sheet = wb.getSheetAt(0);



        // Iterate through each rows from first sheet
        Iterator<Row> rowIterator = sheet.iterator();

        while (rowIterator.hasNext()) 
        {
        Row row;
        Cell cell;
        StringBuilder cellValue = new StringBuilder();
        row = rowIterator.next();

        // For each row, iterate through each columns
        Iterator<Cell> cellIterator = row.cellIterator();
        while (cellIterator.hasNext()) 
        {
                cell = cellIterator.next();
                String test = null;
                switch (cell.getCellType()) 
                {

                case Cell.CELL_TYPE_BOOLEAN:
                    test = String.valueOf(cell.getBooleanCellValue());
                    test = test.replaceAll("\n", " ");
                        cellValue.append(test + "^");
                        break;

                case Cell.CELL_TYPE_NUMERIC:
                    test = String.valueOf(cell.getNumericCellValue());
                    test = test.replaceAll("\n", " ");
                        cellValue.append( test+ "^");
                        break;

                case Cell.CELL_TYPE_STRING:
                    test = cell.getStringCellValue().toString().trim();
                    test = test.replaceAll("\n", " ");
                        cellValue.append( test + "^");
                        break;

                case Cell.CELL_TYPE_BLANK:
                        cellValue.append("" + "^");
                        break;

                default:
                        cellValue.append(cell + "^");

                }
        }
        if(cellValue.toString().equalsIgnoreCase("Here is my all columns name with ceperated ^")){
        continue;   
        }else{
        dataList.add(cellValue);
        }
        cellValue = null;
        }
} 
catch (Exception e) 
{
        System.err.println("Exception :" + e.getMessage());
}
finally{
    System.gc();
}
}

public static void main(String[] args) 
{
        File inputFile = new File("C:/Users/TSR/Desktop/test/");
        //File inputFile = new File("C:/Users/TSR/Desktop/ETL/TSR.xlsx");

        File[] flist = inputFile.listFiles();
        System.out.println("xlsx file generating --->");
        StringBuilder b= new StringBuilder("Here is my all columns name with ceperated ^");
        dataList.add(b);
        for(int i=0;i<flist.length;i++){
            File dataFile = new File(flist[i].getAbsolutePath());
        Thread t = new Thread(new Runnable() {

            @Override
            public void run() {
                // TODO Auto-generated method stub
                    convertXlsxToCSVF(dataFile);
            }
        });
        t.start();
        try {
            t.join();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println("generated file :: "+ i);
        }
        try{
        File outputFile = new File("C:/Users/TSR/Desktop/test/TSR.csv");
        BufferedWriter bw;
        if(outputFile.exists()){
            bw = new BufferedWriter(new FileWriter(outputFile,true));
        }else{
            bw = new BufferedWriter(new FileWriter(outputFile));
        }
        for(int i=0;i<dataList.size();i++){
            bw.write(dataList.get(i).toString());
            bw.write("\n");
        }
        bw.close();
        }catch(Exception e){
            e.printStackTrace();
        }
    System.out.println("csv file generated successfully");
}
}

2 个答案:

答案 0 :(得分:1)

您正在使用POI用户模型,它将整个工作表读入内存。别。改为使用POI事件模型。

此外,您正在内存中构建结果。别。在处理行时写出行。

流式传输输入(POI事件模型)和输出(使用Writer),内存占用空间几乎不会降低,无论Excel文档有多大,都不会耗尽内存。< / p>

答案 1 :(得分:0)

您可能需要将String test替换为StringBuilder test and rework your code with that. Because all the operation like test = test.replaceAll(“\ n”,“”);`正在内存中创建另一个字符串。因此堆问题。