我有一个Java应用程序,我使用openCSV来读取文件(非常大)。然后,我将第4个(最终将有另外一列或两个添加,如果有所不同)列放入HashSet并将其输出到新文件。这一切似乎工作正常但我发现它只是读取文件的一部分(131,544行272,948)。这是openCSV或Java的一般限制还是有办法解决这个问题?
我的参考代码:
public static void main(String[] args) throws IOException {
String itemsFile = new String();
String outFile = new String();
itemsFile = "items.txt";
outFile = "so.txt";
CSVReader reader = null;
try {
reader = new CSVReader(new FileReader(itemsFile), '\t');
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
String[] nextLine;
HashSet<String> brands = new HashSet<>();
while ((nextLine = reader.readNext()) != null) {
brands.add(nextLine[4]);
}
String[] brandArray = new String[brands.size()];
Iterator<String> it = ((HashSet<String>) brands).iterator();
int listNum = 0;
while (it.hasNext()) {
Object brand = (Object) it.next();
brandArray[listNum] = (String) brand;
listNum++;
}
CSVWriter writer = new CSVWriter(new FileWriter(outFile), '\n');
writer.writeNext(brandArray);
writer.close();
}
如果我的代码很乱,我很抱歉这是我的第一个真正的“已完成”的Java应用程序。非常感谢任何帮助。
我甚至尝试从txt文件中删除这些行,以确保它没有挂在某些字符或其他东西上,但它似乎停在该行上
答案 0 :(得分:9)
好的,我想通过聊天用户@Michael来解决这个问题。显然openCSV无法处理如此大的文件,因为它不是流式传输。所以我查看了流式传输这个文件并且效果很好。
这是结束代码:
public static void main(String[] args) throws IOException {
String fileName = new String();
fileName = "items.txt";
String outputFile = new String();
outputFile = "so.txt";
String thisLine;
HashSet<String> brand = new HashSet<>();
FileInputStream fis = new FileInputStream(fileName);
@SuppressWarnings("resource")
BufferedReader myInput = new BufferedReader(new InputStreamReader(fis));
while ((thisLine = myInput.readLine()) != null) {
String[] line = thisLine.split("\t");
if (line[20].equals("1")) {
if (!line[2].equals("") && !line[2].equals(" ")
&& !line[2].equals(null)) {
if(line[2].indexOf("'") > -1){
System.out.println(line[2]);
line[2] = line[2].replace("'", "\'");
System.out.println(line[2]);
}
brand.add(line[2]);
}
}
if (!line[3].equals("") && !line[3].equals(" ")
&& !line[3].equals(null)) {
line[3] = line[3].replace("'", "\'");
brand.add(line[3]);
}
if (!line[4].equals("") && !line[4].equals(" ")
&& !line[4].equals(null)) {
if(line[4].indexOf("'") > -1){
System.out.println(line[4]);
line[4] = line[4].replace("'", "\'");
System.out.println(line[4]);
}
brand.add(line[4]);
}
}
String[] brands = brand.toArray(new String[brand.size()]);
try {
FileWriter fstream = new FileWriter(outputFile);
BufferedWriter bw = new BufferedWriter(fstream);
for (int i = 0; i < brands.length; i++) {
if (i == 0) {
bw.write("'" + brands[i] + "'");
} else {
bw.write(",'" + brands[i] + "'");
}
}
bw.close();
} catch (Exception e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
}
感谢大家的帮助。
答案 1 :(得分:0)
对我来说,问题是 OpenCSV 3.4 中的一个错误,当行的末尾与 bufferedReaders 缓冲区的末尾重合时。
这个测试显示了错误:
@Test
void readWithBufferSize() throws IOException {
for (int bufferSize = 2; bufferSize <= 3; bufferSize++) {
// A <CR> <LF> B <NULL>
byte[] content = {65, 13, 10, 66, 0};
InputStream is = new ByteArrayInputStream(content);
BufferedReader bfReader = new BufferedReader(new InputStreamReader(is), bufferSize);
CSVReader reader = new CSVReader(bfReader);
List<String> rows = new ArrayList<>();
String[] cols;
while((cols = reader.readNext()) != null) {
rows.add(String.join(",", cols));
}
System.out.printf("buffer size: %d rows: %s%n", bufferSize, String.join(",", rows));
// this fails for bufferSize = 3
assert (rows.size() == 2);
}
}