我有一个包含50k +行和400多列的巨大Excel文件。我正在尝试编写java代码以导出到CSV文件,但它无法正常工作(给出错误堆和堆栈)。
然后我使用宏将excel文件拆分为5k行,然后成功生成CSV文件,但某些数据不会出现在csv文件中。我已在数据>中使用Excel应用程序进行了验证从文本获取外部数据到csv。我们必须传递我从过滤器应用的所有信息,然后它显示空白选项。有些行没有在CSV文件中获取数据。
import java.io.*;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
class ExcelToCSV {
static List<StringBuilder> dataList = new ArrayList();
static void convertXlsxToCSVF(File inputFile)
{
// For storing data into CSV files
try
{
// Get the workbook instance for XLSX file
XSSFWorkbook wb = new XSSFWorkbook(inputFile.getAbsolutePath());
// Get first sheet from the workbook
XSSFSheet sheet = wb.getSheetAt(0);
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext())
{
Row row;
Cell cell;
StringBuilder cellValue = new StringBuilder();
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
cell = cellIterator.next();
String test = null;
switch (cell.getCellType())
{
case Cell.CELL_TYPE_BOOLEAN:
test = String.valueOf(cell.getBooleanCellValue());
test = test.replaceAll("\n", " ");
cellValue.append(test + "^");
break;
case Cell.CELL_TYPE_NUMERIC:
test = String.valueOf(cell.getNumericCellValue());
test = test.replaceAll("\n", " ");
cellValue.append( test+ "^");
break;
case Cell.CELL_TYPE_STRING:
test = cell.getStringCellValue().toString().trim();
test = test.replaceAll("\n", " ");
cellValue.append( test + "^");
break;
case Cell.CELL_TYPE_BLANK:
cellValue.append("" + "^");
break;
default:
cellValue.append(cell + "^");
}
}
if(cellValue.toString().equalsIgnoreCase("Here is my all columns name with ceperated ^")){
continue;
}else{
dataList.add(cellValue);
}
cellValue = null;
}
}
catch (Exception e)
{
System.err.println("Exception :" + e.getMessage());
}
finally{
System.gc();
}
}
public static void main(String[] args)
{
File inputFile = new File("C:/Users/TSR/Desktop/test/");
//File inputFile = new File("C:/Users/TSR/Desktop/ETL/TSR.xlsx");
File[] flist = inputFile.listFiles();
System.out.println("xlsx file generating --->");
StringBuilder b= new StringBuilder("Here is my all columns name with ceperated ^");
dataList.add(b);
for(int i=0;i<flist.length;i++){
File dataFile = new File(flist[i].getAbsolutePath());
Thread t = new Thread(new Runnable() {
@Override
public void run() {
// TODO Auto-generated method stub
convertXlsxToCSVF(dataFile);
}
});
t.start();
try {
t.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("generated file :: "+ i);
}
try{
File outputFile = new File("C:/Users/TSR/Desktop/test/TSR.csv");
BufferedWriter bw;
if(outputFile.exists()){
bw = new BufferedWriter(new FileWriter(outputFile,true));
}else{
bw = new BufferedWriter(new FileWriter(outputFile));
}
for(int i=0;i<dataList.size();i++){
bw.write(dataList.get(i).toString());
bw.write("\n");
}
bw.close();
}catch(Exception e){
e.printStackTrace();
}
System.out.println("csv file generated successfully");
}
}
答案 0 :(得分:1)
您正在使用POI用户模型,它将整个工作表读入内存。别。改为使用POI事件模型。
此外,您正在内存中构建结果。别。在处理行时写出行。
流式传输输入(POI事件模型)和输出(使用Writer
),内存占用空间几乎不会降低,无论Excel文档有多大,都不会耗尽内存。< / p>
答案 1 :(得分:0)
您可能需要将String test
替换为StringBuilder test and rework your code with that. Because all the operation like
test = test.replaceAll(“\ n”,“”);`正在内存中创建另一个字符串。因此堆问题。