我有一个很大的CSV文件,数千行,我想使用java代码聚合一些列。
表格中的文件:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
结果应该是:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1
答案 0 :(得分:0)
将数据放入类似地图的结构中,每次在找到密钥(在您的情况下为#34;" + T +年)时,将+1添加到存储的值。
答案 1 :(得分:0)
您可以使用地图
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
或者您可以通过重写hashcode和equals方法来定义自己的T和Year字段类。然后你可以使用
Map<YourClass, Integer> map= new HashMap<>();
T1,2012,2
答案 2 :(得分:0)
String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}
答案 3 :(得分:0)
使用uniVocity-parsers获得最佳效果。处理100万行需要1秒钟。
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
@Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
输出:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。