使用Java在CSV文件中聚合数据

时间:2015-06-25 05:03:03

标签: java csv

我有一个很大的CSV文件,数千行,我想使用java代码聚合一些列。

表格中的文件:

1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1

结果应该是:

T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1

4 个答案:

答案 0 :(得分:0)

将数据放入类似地图的结构中,每次在找到密钥(在您的情况下为#34;" + T +年)时,将+1添加到存储的值。

答案 1 :(得分:0)

您可以使用地图

Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);

或者您可以通过重写hashcode和equals方法来定义自己的T和Year字段类。然后你可以使用

Map<YourClass, Integer> map= new HashMap<>();  

T1,2012,2

答案 2 :(得分:0)

    String csv =
           "1,2012,T1\n"
         + "2,2015,T2\n"
         + "3,2013,T1\n"
         + "4,2012,T1\n";
    Map<String, Integer> map = new TreeMap<>();
    BufferedReader reader = new BufferedReader(new StringReader(csv));
    String line;
    while ((line = reader.readLine()) != null) {
        String[] fields = line.split(",");
        String key = fields[2] + "," + fields[1];
        Integer value = map.get(key);
        if (value == null)
            value = 0;
        map.put(key, value + 1);
    }
    System.out.println(map);
    // -> {T1,2012=2, T1,2013=1, T2,2015=1}

答案 3 :(得分:0)

使用uniVocity-parsers获得最佳效果。处理100万行需要1秒钟。

    CsvParserSettings settings = new CsvParserSettings();
    settings.selectIndexes(1, 2); //select the columns we are going to read

    final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here

    //Use a custom implementation of RowProcessor
    settings.setRowProcessor(new AbstractRowProcessor() {
        @Override
        public void rowProcessed(String[] row, ParsingContext context) {
            List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.

            Integer count = results.get(key);
            if (count == null) {
                count = 0;
            }
            results.put(key, count + 1);
        }
    });

    //creates a parser with the above configuration and RowProcessor
    CsvParser parser = new CsvParser(settings);

    String input = "1,2012,T1"
            + "\n2,2015,T2"
            + "\n3,2013,T1"
            + "\n4,2012,T1";

    //the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
    parser.parse(new StringReader(input));

    //Here are the results:
    for(Entry<List<String>, Integer> entry : results.entrySet()){
        System.out.println(entry.getKey() + " -> " + entry.getValue());
    }

输出:

[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1

披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。