我有一个像这个例子的文本文件:
>chr1:368597-368634
ATGATATAATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAA
ACAAAAAATTTTAAAAGAAAATGTGACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAG
GCCTGT
>chr1:879533-879955
GGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAA
GCCTGT
>chr1:879533-879639
GGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAA
AC
每个组的第一行ID
,以>
开头,下一行是一系列字符。在第二行我要保留最后29行并删除其余部分。所以输出看起来像这样:
>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC
如何使用awk
?
答案 0 :(得分:1)
awk
解决方案:
awk 'r~/^>/{ print r ORS substr($0, length-28) }{ r=$0 }' a1
输出:
>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC
答案 1 :(得分:0)
关注awk
可能对您有所帮助:
awk '/^>/{print;flag=1;next} flag && NF{print substr($0,length($0)-28);flag=""}' Input_file
说明: 现在也在这里添加说明:
awk '
/^>/{ ##Checking here condition if a line starts with > then do following:
print; ##Printing the current line then.
flag=1; ##Setting a variable named flag and setting its value to 1 here. take it as a flag SET to give a GREEN SIGNAL for printing.
next ##next is awk out of the box keyword which will skip all further statements now.
}
flag && NF{ ##Checking condition here if variable flag is NOT NULL and value of NF(numer of field in a line) is NOT NULL then do following:
print substr($0,length($0)-28); ##Printing the substring of current line from the value of current line length-28 to till last so it will pick last 29 chars only.
flag="" ##Unsetting flag here so that again it could be SET once we find a line which starts from >
}
' Input_file ##Mentioning the Input_file name here.
答案 2 :(得分:0)
awk '/^>/||$0=substr($0, length($0)-28)' file
上面的awk单行应该可以帮助你:
kent$ awk '/^>/||$0=substr($0, length($0)-28)' f
>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC
答案 3 :(得分:0)
另一个awk使用模数来决定是否打印或处理:
$ awk 'NR%2;!(NR%2){print substr($0,length()-28)}' file
>chr1:368597-368634
ATAAGCCCTTCTCATTAAACATGATATGG
>chr1:879533-879955
GTGCAGATGGACGAGGTCTGCAGGCCTGT
>chr1:879533-879955
TTTCTTTCGGTTTCGGATGCAAAGCCTGT
>chr1:879533-879639
TTTATTTCTTTCGGTTTCGGATGCAAAAC