来自单个输入的多个条件输出

时间:2019-05-09 22:18:45

标签: awk

我有一个文件test.txt。我正在寻找多种模式匹配,并使用

一张一张地独立打印它们。
    awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt >test1.txt
    awk 'substr($1,5,15) ~ /abb/ { print $0 }' test.txt >test2.txt
    awk 'substr($1,5,15) ~ /abc/ { print $0 }' test.txt >test3.txt

现在,我可以一次性运行它吗?像

之后
    awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt 

在与上述模式不匹配的行中,我可以运行

    awk 'substr($1,5,15) ~ /abb/ { print $0 }'  

以及类似的不匹配图案行

    awk 'substr($1,5,15) ~ /abc/ { print $0 }'

输入文件test.txt

   NNNNNabcabAAAAATCTAATCTGCCAGTT
   NNNNNabcccTTTTTCTAGTCACGATAGCC
   NNNNNaaabbCTAGTTTGTGTAGTAATTTT
   NNNNNaaaabTTTTTTTTTTTTTTTTTTTT
   NNNNNabbbbTTTTTTCACTACTGGGTTTC
   NNNNNabcaaTTTTTTTTAATGGGTCTCAA
   NNNNNabaccTTTTTTTTTCGGGAGGCGGG
   NNNNNccaaaTTTTTTTTTTTTTATTTGAG
   NNNNNabcccTTTTTTTTTACACACAATTC
   NNNNNabcccTAAGACTGGCCCACAGCTGA
   NNNNNabcaaTAGAGACGGGGTTTCACCAT
   NNNNNabcaaTTTTTGTCGAAGATCTCACC
   NNNNNabcabTTGGTAAACAGGCGGGTGTA
   NNNNNabcccTACTTTTTTTAGTGATACAC
   NNNNNaaabbTTTTTGCAAAAAGTAATTTG
   NNNNNabcabTTTTTTTTTCTTTCTGCCTG
   NNNNNabcaaTTTTGAGACAGAATCTTGCT
   NNNNNaaabbTTTTTTTTTTTTTACTAGTG
   NNNNNabcccTAGACAGGGAATACTTTATT
   NNNNNabcabGACAGGGAATACTTATATTC

awk'substr($ 1,5,15)〜/ ccc / {print $ 0}'test.txt> test1.txt

test1.txt

NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT

awk'substr($ 1,5,15)〜/ abb / {print $ 0}'test.txt> test2.txt

test2.txt

NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNaaabbTTTTTTTTTTTTTACTAGTG

awk'substr($ 1,5,15)〜/ abc / {print $ 0}'test.txt> test3.txt

NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcabGACAGGGAATACTTATATTC

在执行此操作时,以下行位于两个输出文件中

  NNNNNabcccTAAGACTGGCCCACAGCTGA
  NNNNNabcccTACTTTTTTTAGTGATACAC
  NNNNNabcccTAGACAGGGAATACTTTATT
  NNNNNabcccTTTTTCTAGTCACGATAGCC
  NNNNNabcccTTTTTTTTTACACACAATTC

我正在寻找的是一旦打印输出,我不想再次在那些输入文件中寻找匹配的模板。我的预期输出

test1.txt

NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT

test2.txt

NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNaaabbTTTTTTTTTTTTTACTAGTG

test3.txt

NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNabcabGACAGGGAATACTTATATTC

3 个答案:

答案 0 :(得分:3)

要在一个awk过程中完成所有三个操作,请尝试:

awk 'substr($1,5,15) ~ /ccc/ { print>"test1.txt"}
    substr($1,5,15) ~ /abb/ { print>"test2.txt"}
    substr($1,5,15) ~ /abc/ { print>"test3.txt"}' test.txt

在这里,print>"test1.txt"打印到文件test1.txt

请注意,>的含义与shell中的含义不同。在awk中,就像在shell中一样,文件的第一个print覆盖文件的先前内容。但是,与shell不同,后续使用print的awk >语句追加到文件。

变化:仅打印到第一个匹配的输出文件

awk 'substr($1,5,15) ~ /ccc/ { print>"test1.txt"; next}
    substr($1,5,15) ~ /abb/ { print>"test2.txt"; next}
    substr($1,5,15) ~ /abc/ { print>"test3.txt"}' test.txt

在这里,当找到匹配项时,next告诉awk跳过其余测试,并跳转到下一行重新开始。

答案 1 :(得分:2)

awk '
{
    str = substr($1,5,15)
    out = 0
    if      (str ~ /ccc/) out=1
    else if (str ~ /abb/) out=2
    else if (str ~ /abc/) out=3
}
out { print > ("test" out ".txt") }
' test.txt

使用GNU awk,您可以使用switch语句代替嵌套的if

答案 2 :(得分:0)

此高尔夫假定没有同时进行的比赛。

gawk '{
  match(substr($1,5,15), /(ccc)|(abb)|(abc)/, A)   # probably unnecessary substring
  for(i in A) n=i                                  # get last index of A (match number)
  print > "test" n ".txt"                          # print to variable filename
}' test.txt