如何替换以“>”开头的字符串?

时间:2019-07-18 11:36:32

标签: linux text awk sed

我有一些看起来像这样的文本文件:

>KZ289077.1 PWK_PHJ_MMCHR11_CTG1
>KZ289078.1 PWK_PHJ_MMCHR11_CTG2
>KZ289079.1 PWK_PHJ_MMCHR11_CTG3
>KZ289073.1 WSB_EIJ_MMCHR11_CTG1
GAGGAGAGGGAGAGGAGAGGGAGAGGAGAGGAGAGGGGAGRGGAGGGGGGGAGGGGAGGG
GCAGAACTGGGATTAGATCTTCTKTGAAGGTCTGATAGAACTCTGCACTAAACCCATCTG
GAAACTTCTCMATTTCATCCAGGTTCTCCAGTTTTGTTGAGTATAGCCTTTTGTAGAAGG
GGAGAGGGAGAGGAGAGGGAGAGGAGAGGAGAGGGGAGRGGAGGGGGGGAGGGGAGGGGA
TGAATTTGGGTCCTTCCCCAGGCAACCTCACGTGATGATACCTTCTTGGGGGGGGGGGRG
>KZ289074.1 WSB_EIJ_MMCHR11_CTG2
TAGTTGTTGCTAGGGTAACACGGTTGGGTTTTTTTTCCAGTATCTGAGTTCATTCTAAKG
>KZ289075.1 WSB_EIJ_MMCHR11_CTG3 

我想将不以“>”开头的行中的所有“ R”字符替换为“ A”。我尝试了以下代码:

awk '$0~/^!">"/ {gsub(/R/,"A")}1' kk.txt 

但是它不起作用。理想的结果应该是这样的:

>KZ289077.1 PWK_PHJ_MMCHR11_CTG1
>KZ289078.1 PWK_PHJ_MMCHR11_CTG2
>KZ289079.1 PWK_PHJ_MMCHR11_CTG3
>KZ289073.1 WSB_EIJ_MMCHR11_CTG1
GAGGAGAGGGAGAGGAGAGGGAGAGGAGAGGAGAGGGGAGAGGAGGGGGGGAGGGGAGGG
GCAGAACTGGGATTAGATCTTCTKTGAAGGTCTGATAGAACTCTGCACTAAACCCATCTG
GAAACTTCTCMATTTCATCCAGGTTCTCCAGTTTTGTTGAGTATAGCCTTTTGTAGAAGG
GGAGAGGGAGAGGAGAGGGAGAGGAGAGGAGAGGGGAGAGGAGGGGGGGAGGGGAGGGGA
TGAATTTGGGTCCTTCCCCAGGCAACCTCACGTGATGATACCTTCTTGGGGGGGGGGGAG
>KZ289074.1 WSB_EIJ_MMCHR11_CTG2
TAGTTGTTGCTAGGGTAACACGGTTGGGTTTTTTTTCCAGTATCTGAGTTCATTCTAAKG
>KZ289075.1 WSB_EIJ_MMCHR11_CTG3 

有人可以给我一个提示吗?谢谢:)

3 个答案:

答案 0 :(得分:4)

应该做一个小改动:

awk '!/^>/ {gsub(/R/,"A")}1' file

>开头的行将R替换为A

答案 1 :(得分:2)

您也可以使用sed来完成

sed '/^[^>]/s/R/A/g' your_file
            | | | |_ globally
            | | |____replace with
            | |______replace this
            |________sed mode search and replace

方括号内的^表示not,在方括号外表示beginning of the line

答案 2 :(得分:0)

这可能对您有用(GNU sed):

sed '/^>/!y/R/A/' file

对于所有不以>开头的行,请将R换成A