使用awk或sed或两者修改文本文件(再次)

时间:2013-02-11 10:51:33

标签: unix text sed awk

我真的很抱歉,我遇到了同样的问题 - awk和sed

我想转换包含以下内容的大型文本文件:

>hg19_ct_UserTrack_3545_12513 range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>hg19_ct_UserTrack_3545_13212 range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT

等等

要:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT

我已经尝试了awk 'NR==1{sub(/^[^ ]* /,"")} 1'sed -i '1s/\w\+ //',但没有任何效果。

3 个答案:

答案 0 :(得分:1)

我假设您要删除以大于号开头的行中的第一个单词。在这种情况下,您可以像这样使用awk

awk '{sub(/^>[^ ]* /,">")} 1'

删除限制,NR==1表示以下块仅在第一行执行。还要在模式和替换中包含>

输出:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT

答案 1 :(得分:1)

这是使用sed的一种方式:

sed '/^>/s/[^ ]* />/' file

结果:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT

答案 2 :(得分:0)

好像你只想删除第一个字段,直到第一个空格。你可以这样做:

cut -f2- -d ' '