我最初过滤了我的文本文件,只包含那些已识别出模式的行(在本例中为“TCTGTACTATATTG”)。现在,从生成的文件中,我想从包含它的每一行中删除此模式以及上游字符。 使用AWK的最佳方法是什么?
以下是我的意见:
@DGTKZQN1:384:C364AACXX:1:1109:19757:66886 2:N:0:GTGAAA
AACAGTTTCTGTACTATATTGACTCATAAGAGTGGTTTAATACGAAGGGAGGAGAAGTTTCCTGGAAATAATCGATTTCCTAGCTTTTAGTTGCAATAAT
+
CCCFFFFFHHHHDIIJJJJJJJJJIIJEIJHHCFGFFGHIIIIJGGIJGG@GHIGEEFDGGIGIJJIEHGIEHHHEDFFFDEEEDDEDDCCDBDDDCDDD
@DGTKZQN1:384:C364AACXX:1:1109:20360:66756 2:N:0:GTGAAA
TTTCTGTACTATATTGGGTGTGAGAAGTAATGGTGCACTCCACAGACCTCCAGTGGCTGCTTGTTCGCCAGAACAGCAAATTTCTGCAGAAGCGCAAAAG
+
@@CFFFFFHHHGHIIIJI;GCGGIIIJFHIIJGEDGGIJIICBDFIIIIJHIIGHIDHGEEHGHHIIJHGD?DDFEECEDDDDCDCCDDDCDDDDDDBC>
@DGTKZQN1:384:C364AACXX:1:1109:21207:66784 2:N:0:GTGAAA
AACAGTTTCTGTACTATATTGTACGTTGTGGATTATTAAAGGGAATAAAAGTGGTAGATTGTGCAGTTGAGGCAGGCTCTCAACTGTGAAACAGCGGTGG
+
@@CFFBDDFHBDCGG<?:CEEAFEEF@A3<?<3C>FEGHGG@DB?8BF@G>?0909??DF>HE@C=)8CEH9DHCB:AED>?C@6>C;6>C3?3=@B8B=
@DGTKZQN1:384:C364AACXX:1:1109:21026:66836 2:N:0:GTGAAA
AGAACAGTTTCTGTACTATATTGTTATACTTCTGTTGTGGGTGTAGAGTTTTCTCCGGCGTTGGCTTCAATGGAATAAGGCACGAGATGAATCCGTGGAG
+
@@@FFFFDHHHDHHIIJJEHHJGJJIGIIEIIIIEHEGHIJDF?DGEE4??DG@FGEG:FHHHHF@D@CEACEEEDDDCCCDDBDDDDDDDACDB??>BD
输出应该是这样的:
@DGTKZQN1:384:C364AACXX:1:1109:19757:66886 2:N:0:GTGAAA
ACTCATAAGAGTGGTTTAATACGAAGGGAGGAGAAGTTTCCTGGAAATAATCGATTTCCTAGCTTTTAGTTGCAATAAT
+
CCCFFFFFHHHHDIIJJJJJJJJJIIJEIJHHCFGFFGHIIIIJGGIJGG@GHIGEEFDGGIGIJJIEHGIEHHHEDFFFDEEEDDEDDCCDBDDDCDDD
@DGTKZQN1:384:C364AACXX:1:1109:20360:66756 2:N:0:GTGAAA
GGTGTGAGAAGTAATGGTGCACTCCACAGACCTCCAGTGGCTGCTTGTTCGCCAGAACAGCAAATTTCTGCAGAAGCGCAAAAG
+
@@CFFFFFHHHGHIIIJI;GCGGIIIJFHIIJGEDGGIJIICBDFIIIIJHIIGHIDHGEEHGHHIIJHGD?DDFEECEDDDDCDCCDDDCDDDDDDBC>
@DGTKZQN1:384:C364AACXX:1:1109:21207:66784 2:N:0:GTGAAA
TACGTTGTGGATTATTAAAGGGAATAAAAGTGGTAGATTGTGCAGTTGAGGCAGGCTCTCAACTGTGAAACAGCGGTGG
+
@@CFFBDDFHBDCGG<?:CEEAFEEF@A3<?<3C>FEGHGG@DB?8BF@G>?0909??DF>HE@C=)8CEH9DHCB:AED>?C@6>C;6>C3?3=@B8B=
@DGTKZQN1:384:C364AACXX:1:1109:21026:66836 2:N:0:GTGAAA
TTATACTTCTGTTGTGGGTGTAGAGTTTTCTCCGGCGTTGGCTTCAATGGAATAAGGCACGAGATGAATCCGTGGAG
+
@@@FFFFDHHHDHHIIJJEHHJGJJIGIIEIIIIEHEGHIJDF?DGEE4??DG@FGEG:FHHHHF@D@CEACEEEDDDCCCDDBDDDDDDDACDB??>BD
我已经尝试过使用awk和split函数,但我正在努力将字符串用作字段分隔符。
答案 0 :(得分:1)
看起来简单的sed
应该适合您:
sed -i.bak 's/^.*TCTGTACTATATTG//g' file
使用awk:
awk '{gsub(/^.*TCTGTACTATATTG/, "")} 1' file
但是使用sed也可以为内联编辑带来好处。
答案 1 :(得分:0)
sed -i.bak 's/.*TCTGTACTATATTG//g' file