为什么Apache Commons CSV解析器将唯一数据附加到第二个结果集中?

时间:2018-10-14 06:00:26

标签: java csv set apache-commons-csv

我在目录中有2个CSV文件(district1.csv,district2.csv),每个文件包含一列schoolCode。 当我使用Apache commons CSV库读取两个CSV文件时,我正在读取schoolCode列的不同值并计算结果。 这是我的代码:

public void getDistinctRecordCount() throws IOException {
        Set<String> uniqueSchools = new HashSet<>();
        int numOfSchools;
        String SchoolCode;

    //Filter to only read csv files.
    File[] files = Directory.listFiles(new FileExtensionFilter());

    for (File f : files) {
        CSVParser csvParser;
        CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim();
        reader = Files.newBufferedReader(Paths.get(Directory + "\\" + f.getName() ), StandardCharsets.ISO_8859_1);
        csvParser = CSVParser.parse(reader, csvFormat);
        for (CSVRecord column : csvParser) {
            SchoolCode = column.get("School Code");
            uniqueSchools.add(SchoolCode);
        }
        Logger.info("The list of Schools for " + f.getName() + " are: " + uniqueSchools);
        numOfSchools = uniqueSchools.size();
        Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
        Logger.info("-----------------------");
    }
}

这是我的输出:

[INFO ] [Logger] - The list of Schools for district1.csv are: [01-0003-002, 01-0003-001]
[INFO ] [Logger] - The total count of Schools for district1.csv are: 2
[INFO ] [Logger] - The list of Schools for district2.csv are: [01-0003-002, 01-0003-001, 01-0018-004, 01-0018-005, 01-0018-002, 01-0018-003, 01-0018-008, 01-0018-006]
[INFO ] [Logger] - The total count of Schools for district2.csv are: 8

问题:将从district1.csv结果中读取的两个值附加到district2.csv结果中,使我的counter2.csv数减少了2(实际正确值应为6)。如何添加?

1 个答案:

答案 0 :(得分:0)

如果不需要所有学校,则可以在循环中移动uniqueSchools或在循环中clear进行移动:

for (File f : files) {
   uniqueSchools.clear();

您还可以将每个文件的学校保存在Map<String, String>中,或者每个文件创建一个学校,记录计数,然后将addAll设置为uniqueSchools

Set<String> currentSchools = new HashSet<>();
..
currentSchools.add(SchoolCode);
Logger.info("The list of Schools for " + f.getName() + " are: " + currentSchools);
numOfSchools = currentSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);        
uniqueSchools.addAll(currentSchools);
  • 考虑变量的小写(驼峰式)首字母,例如将SchoolCode更改为schoolCode,将Logger更改为logger