我正在尝试根据第一列对CSV文件进行排序,添加一个列,其中包含每个唯一值实例的计数。它是每个独特价值的计数器。
DT_12341234, 2014/02/22 10:04:01
DT_12341231, 2014/02/22 10:04:01
DT_12341231, 2014/02/22 10:04:01
DT_12341232, 2014/02/22 10:04:01
DT_12341234, 2014/02/22 10:04:01
DT_12341233, 2014/02/22 10:04:01
DT_12341234, 2014/02/22 10:04:01
DT_12341233, 2014/02/22 10:04:01
DT_12341233, 2014/02/22 10:04:01
DT_12341234, 2014/02/22 10:04:01
这里的输出看起来像这样:
1, DT_12341231, 2014/02/22 10:04:01
2, DT_12341231, 2014/02/22 10:04:01
1, DT_12341232, 2014/02/22 10:04:01
1, DT_12341233, 2014/02/22 10:04:01
2, DT_12341233, 2014/02/22 10:04:01
3, DT_12341233, 2014/02/22 10:04:01
1, DT_12341234, 2014/02/22 10:04:01
2, DT_12341234, 2014/02/22 10:04:01
3, DT_12341234, 2014/02/22 10:04:01
4, DT_12341234, 2014/02/22 10:04:01
我尝试用awk,uniq和sort这样做,到目前为止没有运气。我也没有在Stack Overflow或其他论坛上找到类似我的案例。希望被引导到正确的方向。
答案 0 :(得分:4)
使用sort
和awk
:
sort file -V | awk '{ print ++a[$1] "," $0 }'
如果每一行都没有额外的空格,请将","
更改为", "
。
输出:
1, DT_12341231, 2014/02/22 10:04:01
2, DT_12341231, 2014/02/22 10:04:01
1, DT_12341232, 2014/02/22 10:04:01
1, DT_12341233, 2014/02/22 10:04:01
2, DT_12341233, 2014/02/22 10:04:01
3, DT_12341233, 2014/02/22 10:04:01
1, DT_12341234, 2014/02/22 10:04:01
2, DT_12341234, 2014/02/22 10:04:01
3, DT_12341234, 2014/02/22 10:04:01
4, DT_12341234, 2014/02/22 10:04:01
答案 1 :(得分:1)
awk并排序而不将所有第一个字段存储到数组中:
awk -F, -v OFS=, 'p!=$1{i=1} p==$1{i++} {p=$1; print i, $0}' < <(sort -t, -k1 file.csv)
1, DT_12341231, 2014/02/22 10:04:01
2, DT_12341231, 2014/02/22 10:04:01
1, DT_12341232, 2014/02/22 10:04:01
1, DT_12341233, 2014/02/22 10:04:01
2, DT_12341233, 2014/02/22 10:04:01
3, DT_12341233, 2014/02/22 10:04:01
1, DT_12341234, 2014/02/22 10:04:01
2, DT_12341234, 2014/02/22 10:04:01
3, DT_12341234, 2014/02/22 10:04:01
4, DT_12341234, 2014/02/22 10:04:01