慢速CSV行解析和拆分

时间:2017-11-13 23:17:14

标签: java performance csv

我正在尝试解析超过100,000行的csv,性能问题甚至没有让我到达文件的末尾然后点击“线程中的异常”主“java.lang.OutOfMemoryError:GC开销限制超过“

有什么不对,或者我可以改进的方式吗?

public static List<String[]> readCSV(String filePath) throws IOException{
    List<String[]> csvLine= new ArrayList<String[]>();
    CSVReader reader = new CSVReader(new FileReader(filePath), '\n');
    String[] row;

    while((row = reader.readNext()) != null){
        csvLine.add(removeWhiteSpace(row[0].toString().split(",")));
    }

    reader.close();
    return csvLine;
}

private static String[] removeWhiteSpace(String[] split) {
    for(int index =0; index < split.length;index++){
        split[index] = split[index].trim();
    }
    return split;
}

2 个答案:

答案 0 :(得分:1)

首先,由于所有行都被添加到列表中,因此内存不足。

其次你正在使用极慢的String.split()。

第三个从不尝试通过编写自己的解析代码来处理CSV,因为围绕这种格式存在许多边缘情况(需要处理分隔符,引号等的转义)。

解决方案是使用库,例如​​univocity-parsers。您应该能够在不到一秒的时间内读取100万行。

要解析,只需执行此操作:

public static void main(String... args) {
    IterableResult<String[], ParsingContext> rows = readCSV("c:/path/to/input.csv");

    try {
        for (String[] row : rows) {
            //process the rows however you want
        }
    } finally {
        //the parser closes itself but in case any errors processing the rows (outside of the control of the iterator), close the parser.
        rows.getContext().stop();
    }
}

现在你可以使用这样的方法:

public static void main(String... args) {
    //this is your output file
    File output = new File("c:/path/to/output.csv");

    //configure the writer if you need to
    CsvWriterSettings settings = new CsvWriterSettings();

    //create the writer. Here we write to a file
    CsvWriter writer = new CsvWriter(output, settings);

    //get the row iterator
    IterableResult<String[], ParsingContext> rows = readCSV("c:/temp");

    try {
        //do whatever you need to the rows here
        for (String[] row : rows) {
            //then write it each one to the output.
            writer.writeRow(row);
        }
    } finally {
        //cleanup
        rows.getContext().stop();
        writer.close();
    }
}

这只是一个如何使用解析器的示例,但有许多不同的方法可以使用它。

现在写作,你可以这样做:

public static void main(String... args) throws IOException {
    CsvParserSettings parserSettings = new CsvParserSettings();
    parserSettings.setProcessor(new AbstractRowProcessor() {
        @Override
        public void rowProcessed(String[] row, ParsingContext context) {
            //modify the row data here.
        }
    });

    CsvWriterSettings writerSettings = new CsvWriterSettings();
    CsvRoutines routines = new CsvRoutines(parserSettings, writerSettings);

    FileReader input = new FileReader("c:/path/to/input.csv");
    FileWriter output = new FileWriter("c:/path/to/output.csv");

    routines.parseAndWrite(input, output);
}

如果你想要的只是读取数据,修改它并将其写回另一个文件,你可以这样做:

import { Component, OnInit } from '@angular/core';
import {NgbModal, ModalDismissReasons} from '@ng-bootstrap/ng-bootstrap';

@Component({
  selector: 'app-modal',
  templateUrl: './modal.component.html',
  styleUrls: ['./modal.component.css']
})
export class ModalComponent implements OnInit {
  closeResult: string;

  constructor(private modalService: NgbModal) { }

  open(content) {
    this.modalService.open(content).result.then((result) => {
      this.closeResult = `Closed with: ${result}`;
    }, (reason) => {
      this.closeResult = `Dismissed ${this.getDismissReason(reason)}`;
    });
  }

  private getDismissReason(reason: any): string {
    if (reason === ModalDismissReasons.ESC) {
      return 'by pressing ESC';
    } else if (reason === ModalDismissReasons.BACKDROP_CLICK) {
      return 'by clicking on a backdrop';
    } else {
      return  `with: ${reason}`;
    }
  }

  ngOnInit() {
  }

}

希望这有帮助。

免责声明:我是这个图书馆的作者。它是开源和免费的(Apache 2.0许可证)。

答案 1 :(得分:-1)

设计错误是否尝试将如此大的文件放入内存中。 根据您的要求,您应该编写处理的新文件,或将行放入dba。 这实现了第一个:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}