我正在使用Apache Commons CSV库解析2GB CSV文件,但出现堆内存问题。
错误 嵌套的异常是java.lang.OutOfMemoryError:Java堆空间
Reader reader = new InputStreamReader(inputStream);
List<SiebelRecord> siebelRecords = new ArrayList<>();
CSVParser csvParser = null;
try {
csvParser = new CSVParser(reader, CSVFormat.DEFAULT
.withEscape('/')
.withFirstRecordAsHeader()
.withDelimiter('|')
.withIgnoreHeaderCase()
.withTrim());
List<CSVRecord> recordList = csvParser.getRecords();
siebelRecords = recordList.stream().sequential().map(csvRecord -> new SiebelRecord(csvRecord.get("CUSTOMER_ID"), csvRecord.get("CUSTOMER_NAME"), csvRecord.get("CUSTOMER_ORG"), csvRecord.get("CUSTOMER_PIN"), csvRecord.get("CUSTOMER_TYPE"), csvRecord.get("CUSTOMER_STATUS")
, csvRecord.get("CUSTOMER_DOM"), csvRecord.get("BILLING_ID"), csvRecord.get("BILLING_NAME"), csvRecord.get("BILLING_NUMBER"), csvRecord.get("BILLING_STATUS"), csvRecord.get("BILLING_PIN")
, csvRecord.get("BILLING_ACCOUNT_TYPE"), csvRecord.get("BILLING_METHOD"), csvRecord.get("BILLING_TYPE"), csvRecord.get("SERVICE_ID"), csvRecord.get("SERVICE_TYPE"),
csvRecord.get("CONNECTION_STATUS"), csvRecord.get("SERVICE_PIN"), csvRecord.get("PRIMARY_SERVICE_ID"), csvRecord.get("ROOT_ASSET_ID"), csvRecord.get("PRODUCT_NAME")
, csvRecord.get("CONNECTION_NAME"), csvRecord.get("SECONDARY_SERVICE_ID"))).collect(Collectors.toList());
} finally {
inputStream.close();
reader.close();
if (csvParser != null) {
csvParser.close();
}
}
有没有我想要的财产,或者是图书馆问题。
答案 0 :(得分:-1)
我的回答与增加堆空间有些关系。我会尝试逐行解析大型文件,而不是将它们全部加载到JVM中。这是一个类似问题的链接,最后一个答案演示了一种使用缓冲读取器处理大型文件的方法。 How can I process a large file via CSVParser?