根据文件夹中多个文件的条件筛选出行

时间:2015-09-18 15:33:01

标签: r awk

如何在名为0的列中过滤掉除mism以外的任何值的行?

freq_mir_seq                                mir_seq                                 seq                     name         freq   mir          start  end mism      add     t5      t3      s5      s3      DB      ambiguity
0_hsa-miR-143-3p_TGAGAAGAAGCACTGTAGCTCTT    hsa-miR-143-3p_TGAGAAGAAGCACTGTAGCTCTT  TGAGAAGAAGCACTGTAGCTCTT seq_100006_x0   0   hsa-miR-143-3p  61  81  6AT u-TT    0   0   AGTCTGAG    GCTCAGGA    miRNA   1
5_hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTA    hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTA  GACCCTGTAGATCCGAATTTGTA seq_100012_x5   5   hsa-miR-10a-5p  22  43  1GT u-A 0   u-G TATATACC    TGTGTAAG    miRNA   1
126_hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTG  hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTG  GACCCTGTAGATCCGAATTTGTG seq_100013_x126 126 hsa-miR-10a-5p  22  44  1GT 0   0   0   TATATACC    TGTGTAAG    miRNA   1
23_hsa-miR-1296-5p_TTAGGGCCCTGGCTCCATCT hsa-miR-1296-5p_TTAGGGCCCTGGCTCCATCT    TTAGGGCCCTGGCTCCATCT    seq_100019_x23  23  hsa-miR-1296-5p 16  35  0   0   0   u-CC    TGGGTTAG    CTCCTTTA    miRNA   1
3_hsa-miR-887-3p_GTGAACGGGCGCCATCCCGAGGCTT  hsa-miR-887-3p_GTGAACGGGCGCCATCCCGAGGCTT    GTGAACGGGCGCCATCCCGAGGCTT   seq_100029_x3   3   hsa-miR-887-3p  48  72  0   0   0   d-CTT   TGGAGTGA    GAGGCTTT    miRNA   1
17_hsa-miR-10a-5p_ACCCGGTAGATCCGAATTTGTG    hsa-miR-10a-5p_ACCCGGTAGATCCGAATTTGTG   ACCCGGTAGATCCGAATTTGTG  seq_10002_x17   17  hsa-miR-10a-5p  23  44  5GT 0   d-T 0   TATATACC    TGTGTAAG    miRNA   1

我试过了:

df[df$mism != 0,]

我有一个包含100个文件的文件夹看起来一样,我如何同时在所有文件上运行此命令? R有可能吗?文件名为Miraligner_*.txt.mirna,文件之间的*不同。

1 个答案:

答案 0 :(得分:4)

这应该是你所需要的:

$ awk 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} $(f["mism"])' Miraligner_*.txt.mirna
freq_mir_seq                                mir_seq                                 seq                     name         freq   mir          start  end mism      add     t5      t3      s5      s3      DB      ambiguity
0_hsa-miR-143-3p_TGAGAAGAAGCACTGTAGCTCTT    hsa-miR-143-3p_TGAGAAGAAGCACTGTAGCTCTT  TGAGAAGAAGCACTGTAGCTCTT seq_100006_x0   0   hsa-miR-143-3p  61  81  6AT u-TT    0   0   AGTCTGAG    GCTCAGGA    miRNA   1
5_hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTA    hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTA  GACCCTGTAGATCCGAATTTGTA seq_100012_x5   5   hsa-miR-10a-5p  22  43  1GT u-A 0   u-G TATATACC    TGTGTAAG    miRNA   1
126_hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTG  hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTG  GACCCTGTAGATCCGAATTTGTG seq_100013_x126 126 hsa-miR-10a-5p  22  44  1GT 0   0   0   TATATACC    TGTGTAAG    miRNA   1
17_hsa-miR-10a-5p_ACCCGGTAGATCCGAATTTGTG    hsa-miR-10a-5p_ACCCGGTAGATCCGAATTTGTG   ACCCGGTAGATCCGAATTTGTG  seq_10002_x17   17  hsa-miR-10a-5p  23  44  5GT 0   d-T 0   TATATACC    TGTGTAAG    miRNA   1

要为每个输入文件Miraligner_foo.txt.mirna.out创建名为Miraligner_foo.txt.mirna的新的单独输出文件,您可以执行以下操作:

awk 'FNR==1{out=FILENAME".out"; for (i=1;i<=NF;i++) f[$i]=i} $(f["mism"]){print > out}' Miraligner_*.txt.mirna

如果你没有使用GNU awk那么你可能会得到一个太多的打开文件&#34;上面的错误所以你只需要在打开下一个文件之前关闭最后一个文件:

awk 'FNR==1{close(out); out=FILENAME".out"; for (i=1;i<=NF;i++) f[$i]=i} $(f["mism"]){print > out}' Miraligner_*.txt.mirna