我想在重复行上添加一个序列号

时间:2015-08-17 09:43:27

标签: perl awk

我必须检查文件中的重复行,并在行尾添加一个字符。 但是,我想在每个重复行的末尾添加一个序列号。

数据格式如下:

add sample A1
add sample A2
add sample A2
add sample A3
add sample A3
add sample A3
add sample A4

如何使用awk格式化数据如下所示?

add sample A1
add sample A2
add sample A2_1
add sample A3
add sample A3_1
add sample A3_2
add sample A4

4 个答案:

答案 0 :(得分:1)

使用awk,您可以编写类似

的内容
awk 'col[$3]++{print $0"_"col[$3]-1;next}1' input
  • col[$3]++第三列递增,并保存在关联数组col中。如果此计数大于1,则完成打印行后跟计数的相应操作。

  • print $0打印整个记录。

  • 1始终为true,采用默认操作打印整行。

<强>测试

$ awk 'col[$3]++{print $0"_"col[$3]-1;next}1' input
add sample A1
add sample A2
add sample A2_1
add sample A3
add sample A3_1
add sample A3_2
add sample A4

答案 1 :(得分:1)

awk的一种方式:

awk '{count=seen[$0]++; print $0 (count ? "_"count: "")}' file
add sample A1
add sample A2
add sample A2_1
add sample A3
add sample A3_1
add sample A3_2
add sample A4

说明:

count=seen[$0]++     # Increment the number of times this line has been seen
print $0             # Print the line ($0 contains the whole line)
(count?"_"count:"")  # If the count if truthy (>0) also print "_" count

答案 2 :(得分:0)

Perlishly:

#!/usr/bin/env perl
use strict;
use warnings;

my %seen;
while (<>) {
    chomp;
    print;
    my ($doodad) = m{\b(\w+)$};   #grab last word on line
    if ( $seen{$doodad}++ ) {
        print "_", $seen{$doodad} - 1; #print taggy thing if it has been seen. 
    }
    print "\n";
}

可以浓缩成一个衬里 - 它看起来有点像你已经得到的awk答案。

答案 3 :(得分:0)

如果你想在比较中考虑整行,那就更像是:

awk '{print $0 (done[$0]++ ? "_" done[$0] : "") }'