我真的很抱歉,我遇到了同样的问题 - awk和sed
我想转换包含以下内容的大型文本文件:
>hg19_ct_UserTrack_3545_12513 range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>hg19_ct_UserTrack_3545_13212 range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT
等等
要:
>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT
我已经尝试了awk 'NR==1{sub(/^[^ ]* /,"")} 1'
和sed -i '1s/\w\+ //'
,但没有任何效果。
答案 0 :(得分:1)
我假设您要删除以大于号开头的行中的第一个单词。在这种情况下,您可以像这样使用awk
:
awk '{sub(/^>[^ ]* /,">")} 1'
删除限制,NR==1
表示以下块仅在第一行执行。还要在模式和替换中包含>
。
输出:
>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT
答案 1 :(得分:1)
这是使用sed
的一种方式:
sed '/^>/s/[^ ]* />/' file
结果:
>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT
答案 2 :(得分:0)
好像你只想删除第一个字段,直到第一个空格。你可以这样做:
cut -f2- -d ' '