如何使用awk编辑文本文件

时间:2018-02-12 13:31:25

标签: awk

我有一个像这个例子的文本文件:

>chr1:368597-368634
ATGATATAATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAA
ACAAAAAATTTTAAAAGAAAATGTGACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAG
GCCTGT
>chr1:879533-879955
GGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAA
GCCTGT
>chr1:879533-879639
GGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAA
AC

每个组的第一行ID,以>开头,下一行是一系列字符。在第二行我要保留最后29行并删除其余部分。所以输出看起来像这样:

>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC

如何使用awk

执行此操作

4 个答案:

答案 0 :(得分:1)

awk 解决方案:

awk 'r~/^>/{ print r ORS substr($0, length-28) }{ r=$0 }' a1

输出:

>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC

答案 1 :(得分:0)

关注awk可能对您有所帮助:

awk '/^>/{print;flag=1;next} flag && NF{print substr($0,length($0)-28);flag=""}'   Input_file

说明: 现在也在这里添加说明:

awk '
/^>/{                             ##Checking here condition if a line starts with > then do following:
  print;                          ##Printing the current line then.
  flag=1;                         ##Setting a variable named flag and setting its value to 1 here. take it as a flag SET to give a GREEN SIGNAL for printing.
  next                            ##next is awk out of the box keyword which will skip all further statements now.
}
flag && NF{                       ##Checking condition here if variable flag is NOT NULL and value of NF(numer of field in a line) is NOT NULL then do following:
  print substr($0,length($0)-28); ##Printing the substring of current line from the value of current line length-28 to till last so it will pick last 29 chars only.
  flag=""                         ##Unsetting flag here so that again it could be SET once we find a line which starts from >
}
'  Input_file                     ##Mentioning the Input_file name here.

答案 2 :(得分:0)

awk '/^>/||$0=substr($0, length($0)-28)' file
上面的awk单行应该可以帮助你:

kent$  awk '/^>/||$0=substr($0, length($0)-28)' f
>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639   
TTTATTTCTTTCGGTTTCGGATGCAAAAC 

答案 3 :(得分:0)

另一个awk使用模数来决定是否打印或处理:

$ awk 'NR%2;!(NR%2){print substr($0,length()-28)}' file
>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC