我正在尝试解析超过100,000行的csv,性能问题甚至没有让我到达文件的末尾然后点击“线程中的异常”主“java.lang.OutOfMemoryError:GC开销限制超过“
有什么不对,或者我可以改进的方式吗?
public static List<String[]> readCSV(String filePath) throws IOException{
List<String[]> csvLine= new ArrayList<String[]>();
CSVReader reader = new CSVReader(new FileReader(filePath), '\n');
String[] row;
while((row = reader.readNext()) != null){
csvLine.add(removeWhiteSpace(row[0].toString().split(",")));
}
reader.close();
return csvLine;
}
private static String[] removeWhiteSpace(String[] split) {
for(int index =0; index < split.length;index++){
split[index] = split[index].trim();
}
return split;
}
答案 0 :(得分:1)
首先,由于所有行都被添加到列表中,因此内存不足。
其次你正在使用极慢的String.split()。
第三个从不尝试通过编写自己的解析代码来处理CSV,因为围绕这种格式存在许多边缘情况(需要处理分隔符,引号等的转义)。
解决方案是使用库,例如univocity-parsers。您应该能够在不到一秒的时间内读取100万行。
要解析,只需执行此操作:
public static void main(String... args) {
IterableResult<String[], ParsingContext> rows = readCSV("c:/path/to/input.csv");
try {
for (String[] row : rows) {
//process the rows however you want
}
} finally {
//the parser closes itself but in case any errors processing the rows (outside of the control of the iterator), close the parser.
rows.getContext().stop();
}
}
现在你可以使用这样的方法:
public static void main(String... args) {
//this is your output file
File output = new File("c:/path/to/output.csv");
//configure the writer if you need to
CsvWriterSettings settings = new CsvWriterSettings();
//create the writer. Here we write to a file
CsvWriter writer = new CsvWriter(output, settings);
//get the row iterator
IterableResult<String[], ParsingContext> rows = readCSV("c:/temp");
try {
//do whatever you need to the rows here
for (String[] row : rows) {
//then write it each one to the output.
writer.writeRow(row);
}
} finally {
//cleanup
rows.getContext().stop();
writer.close();
}
}
这只是一个如何使用解析器的示例,但有许多不同的方法可以使用它。
现在写作,你可以这样做:
public static void main(String... args) throws IOException {
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setProcessor(new AbstractRowProcessor() {
@Override
public void rowProcessed(String[] row, ParsingContext context) {
//modify the row data here.
}
});
CsvWriterSettings writerSettings = new CsvWriterSettings();
CsvRoutines routines = new CsvRoutines(parserSettings, writerSettings);
FileReader input = new FileReader("c:/path/to/input.csv");
FileWriter output = new FileWriter("c:/path/to/output.csv");
routines.parseAndWrite(input, output);
}
如果你想要的只是读取数据,修改它并将其写回另一个文件,你可以这样做:
import { Component, OnInit } from '@angular/core';
import {NgbModal, ModalDismissReasons} from '@ng-bootstrap/ng-bootstrap';
@Component({
selector: 'app-modal',
templateUrl: './modal.component.html',
styleUrls: ['./modal.component.css']
})
export class ModalComponent implements OnInit {
closeResult: string;
constructor(private modalService: NgbModal) { }
open(content) {
this.modalService.open(content).result.then((result) => {
this.closeResult = `Closed with: ${result}`;
}, (reason) => {
this.closeResult = `Dismissed ${this.getDismissReason(reason)}`;
});
}
private getDismissReason(reason: any): string {
if (reason === ModalDismissReasons.ESC) {
return 'by pressing ESC';
} else if (reason === ModalDismissReasons.BACKDROP_CLICK) {
return 'by clicking on a backdrop';
} else {
return `with: ${reason}`;
}
}
ngOnInit() {
}
}
希望这有帮助。
免责声明:我是这个图书馆的作者。它是开源和免费的(Apache 2.0许可证)。
答案 1 :(得分:-1)
设计错误是否尝试将如此大的文件放入内存中。 根据您的要求,您应该编写处理的新文件,或将行放入dba。 这实现了第一个:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}