使用awk计算字符并相应地修改文件

时间:2013-02-05 12:00:25

标签: awk counting

我有一个看起来像这样的文件

@FCD17BKACXX:8:1101:2703:2197#0/1
CAGCTTTACTCGTCATTTCCCCCAAGGGTAAAATGCGTCCGTCCATTAAGTTCACAGTCATCGTCT
+FCD17BKACXX:8:1101:2703:2197#0/1
^`^\eggcghheJ`dffhhhffhe`ecd^a^_ceacecfhf\beZegfhh_fghhgfZbdg]c^a`
@FCD17BKACXX:8:1101:4434:2244#0/1
CTGCGTTCATCGCGTTGTTGGGAGGAATCTCTACCCCAGGTTCTCGCTGTGAA
+FCD17BKACXX:8:1101:4434:2244#0/1
eeecgeceeffhhihi_fhhiicdgfghiiihiiihiiihVbcdgfhge`cee
@FCD17BKACXX:8:1101:6394:2107#0/1
CAGCAGGACTAGGGCCTGCAGACGTACTG
+FCD17BKACXX:8:1101:6394:2107#0/1
eeeccggeghhiihiihihihhhhcfghf

我想转到每一行并计算字符数。如果该行包含少于例如66个字符然后用'A'填充到66并打印到新文件。如果它包含66个字符,那么只需按原样打印该行。

输出文件看起来像这样;

@FCD17BKACXX:8:1101:2703:2197#0/1
CAGCTTTACTCGTCATTTCCCCCAAGGGTAAAATGCGTCCGTCCATTAAGTTCACAGTCATCGTCT
+FCD17BKACXX:8:1101:2703:2197#0/1
^`^\eggcghheJ`dffhhhffhe`ecd^a^_ceacecfhf\beZegfhh_fghhgfZbdg]c^a`
@FCD17BKACXX:8:1101:4434:2244#0/1
CTGCGTTCATCGCGTTGTTGGGAGGAATCTCTACCCCAGGTTCTCGCTGTGAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:4434:2244#0/1
eeecgeceeffhhihi_fhhiicdgfghiiihiiihiiihVbcdgfhge`ceeAAAAAAAAAAAAA
@FCD17BKACXX:8:1101:6394:2107#0/1
CAGCAGGACTAGGGCCTGCAGACGTACTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:6394:2107#0/1
eeeccggeghhiihiihihihhhhcfghfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

我对awk有非常基本的了解,所以从学习的角度来看,我想用awk来解决问题。

5 个答案:

答案 0 :(得分:4)

一种方式:

awk '!(NR%2) && length<66{for(i=length;i<66;i++)$0=$0 "A"}1' file

答案 1 :(得分:1)

awk 'NR%2 == 0{
    printf("%s", $0)
    for(i=length($0); i<66; i++)printf("A")
    print "";next }
    {print}'

答案 2 :(得分:1)

我会粘贴另一个奇怪的(可能)oneliner:

 awk 'BEGIN{while(++i<66)t=t"A"}!(NR%2){$0=$0substr(t,length)}1' file

答案 3 :(得分:1)

这应该比接受的方法更快:

awk 'NR%2==0 { x = sprintf("%-66s", $0); gsub(/ /,"A",x); $0 = x }1' file

结果:

@FCD17BKACXX:8:1101:2703:2197#0/1
CAGCTTTACTCGTCATTTCCCCCAAGGGTAAAATGCGTCCGTCCATTAAGTTCACAGTCATCGTCT
+FCD17BKACXX:8:1101:2703:2197#0/1
^`^\eggcghheJ`dffhhhffhe`ecd^a^_ceacecfhf\beZegfhh_fghhgfZbdg]c^a`
@FCD17BKACXX:8:1101:4434:2244#0/1
CTGCGTTCATCGCGTTGTTGGGAGGAATCTCTACCCCAGGTTCTCGCTGTGAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:4434:2244#0/1
eeecgeceeffhhihi_fhhiicdgfghiiihiiihiiihVbcdgfhge`ceeAAAAAAAAAAAAA
@FCD17BKACXX:8:1101:6394:2107#0/1
CAGCAGGACTAGGGCCTGCAGACGTACTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:6394:2107#0/1
eeeccggeghhiihiihihihhhhcfghfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

答案 4 :(得分:0)

awk -v FS= '{printf "%s",$0} !(NR%2){for (i=NF+1;i<=66;i++) printf "A"} {print ""}'

或者如果你不喜欢循环:

awk -v FS= '{sfx=(NR%2 ? "" : sprintf("%*s",66-NF,"")); gsub(/ /,"A",sfx); print $0 sfx}'