我在文件中有这种格式的数据:
{"field1":249449,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249448,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249447,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249443,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249449,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
这里,每个条目代表一行。我希望根据字段1中的值计算行数,例如:
249449 : 2
249448 : 1
249447 : 1
249443 : 1
我怎么能得到它?
答案 0 :(得分:3)
awk
$ awk -F'[,:]' -v OFS=' : ' '{a[$2]++} END{for(k in a) print k, a[k]}' file
答案 1 :(得分:2)
您可以使用jq命令行工具来解释JSON数据。 uniq -c
计算出现次数。
% jq .field1 < $INPUTFILE | sort | uniq -c
1 249443
1 249447
1 249448
2 249449
(使用zsh在linux xubuntu 18.04上使用jq 1.5-1-a5b5cbe
测试)
答案 2 :(得分:0)
这是一个有效的jq
唯一解决方案:
reduce inputs.field1 as $x ({}; .[$x|tostring] += 1)
| to_entries[]
| "\(.key) : \(.value)"
调用:jq -nrf program.jq input.json
(特别注意-n
选项。)
当然,如果计数的对象表示令人满意,那么 人们可以简单地写一下:
jq -n 'reduce inputs.field1 as $x ({}; .[$x|tostring] += 1)' input.json
答案 3 :(得分:0)
使用datamash
和一些shell utils,将非数据分隔符更改为压缩选项卡,计算字段3,(它是字段2,但是有一个前导选项卡),反向,然后按照每个 OP 规范:
tr -s '{":,}' '\t' < file | datamash -sg 3 count 3 | tac | xargs printf '%s : %s\n'
输出:
249449 : 2
249448 : 1
249447 : 1
249443 : 1