Bash编号唯一字段上的CSV内容

时间:2014-08-13 15:40:15

标签: bash sorting

我正在尝试根据第一列对CSV文件进行排序,添加一个列,其中包含每个唯一值实例的计数。它是每个独特价值的计数器。

 DT_12341234, 2014/02/22 10:04:01
 DT_12341231, 2014/02/22 10:04:01
 DT_12341231, 2014/02/22 10:04:01
 DT_12341232, 2014/02/22 10:04:01
 DT_12341234, 2014/02/22 10:04:01
 DT_12341233, 2014/02/22 10:04:01
 DT_12341234, 2014/02/22 10:04:01
 DT_12341233, 2014/02/22 10:04:01
 DT_12341233, 2014/02/22 10:04:01
 DT_12341234, 2014/02/22 10:04:01

这里的输出看起来像这样:

 1, DT_12341231, 2014/02/22 10:04:01
 2, DT_12341231, 2014/02/22 10:04:01
 1, DT_12341232, 2014/02/22 10:04:01
 1, DT_12341233, 2014/02/22 10:04:01
 2, DT_12341233, 2014/02/22 10:04:01
 3, DT_12341233, 2014/02/22 10:04:01
 1, DT_12341234, 2014/02/22 10:04:01
 2, DT_12341234, 2014/02/22 10:04:01
 3, DT_12341234, 2014/02/22 10:04:01
 4, DT_12341234, 2014/02/22 10:04:01

我尝试用awk,uniq和sort这样做,到目前为止没有运气。我也没有在Stack Overflow或其他论坛上找到类似我的案例。希望被引导到正确的方向。

2 个答案:

答案 0 :(得分:4)

使用sortawk

sort file -V | awk '{ print ++a[$1] "," $0 }'

如果每一行都没有额外的空格,请将","更改为", "

输出:

1, DT_12341231, 2014/02/22 10:04:01
2, DT_12341231, 2014/02/22 10:04:01
1, DT_12341232, 2014/02/22 10:04:01
1, DT_12341233, 2014/02/22 10:04:01
2, DT_12341233, 2014/02/22 10:04:01
3, DT_12341233, 2014/02/22 10:04:01
1, DT_12341234, 2014/02/22 10:04:01
2, DT_12341234, 2014/02/22 10:04:01
3, DT_12341234, 2014/02/22 10:04:01
4, DT_12341234, 2014/02/22 10:04:01

答案 1 :(得分:1)

awk并排序而不将所有第一个字段存储到数组中:

awk -F, -v OFS=, 'p!=$1{i=1} p==$1{i++} {p=$1; print i, $0}' < <(sort -t, -k1 file.csv)
1, DT_12341231, 2014/02/22 10:04:01
2, DT_12341231, 2014/02/22 10:04:01
1, DT_12341232, 2014/02/22 10:04:01
1, DT_12341233, 2014/02/22 10:04:01
2, DT_12341233, 2014/02/22 10:04:01
3, DT_12341233, 2014/02/22 10:04:01
1, DT_12341234, 2014/02/22 10:04:01
2, DT_12341234, 2014/02/22 10:04:01
3, DT_12341234, 2014/02/22 10:04:01
4, DT_12341234, 2014/02/22 10:04:01