我必须检查文件中的重复行,并在行尾添加一个字符。 但是,我想在每个重复行的末尾添加一个序列号。
数据格式如下:
add sample A1
add sample A2
add sample A2
add sample A3
add sample A3
add sample A3
add sample A4
如何使用awk格式化数据如下所示?
add sample A1
add sample A2
add sample A2_1
add sample A3
add sample A3_1
add sample A3_2
add sample A4
答案 0 :(得分:1)
使用awk
,您可以编写类似
awk 'col[$3]++{print $0"_"col[$3]-1;next}1' input
col[$3]++
第三列递增,并保存在关联数组col
中。如果此计数大于1,则完成打印行后跟计数的相应操作。
print $0
打印整个记录。
1
始终为true,采用默认操作打印整行。
<强>测试强>
$ awk 'col[$3]++{print $0"_"col[$3]-1;next}1' input
add sample A1
add sample A2
add sample A2_1
add sample A3
add sample A3_1
add sample A3_2
add sample A4
答案 1 :(得分:1)
awk
的一种方式:
awk '{count=seen[$0]++; print $0 (count ? "_"count: "")}' file
add sample A1
add sample A2
add sample A2_1
add sample A3
add sample A3_1
add sample A3_2
add sample A4
说明:
count=seen[$0]++ # Increment the number of times this line has been seen
print $0 # Print the line ($0 contains the whole line)
(count?"_"count:"") # If the count if truthy (>0) also print "_" count
答案 2 :(得分:0)
Perlishly:
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
while (<>) {
chomp;
print;
my ($doodad) = m{\b(\w+)$}; #grab last word on line
if ( $seen{$doodad}++ ) {
print "_", $seen{$doodad} - 1; #print taggy thing if it has been seen.
}
print "\n";
}
可以浓缩成一个衬里 - 它看起来有点像你已经得到的awk
答案。
答案 3 :(得分:0)
如果你想在比较中考虑整行,那就更像是:
awk '{print $0 (done[$0]++ ? "_" done[$0] : "") }'