bash to grep file匹配但包含唯一标题行

时间:2016-08-30 18:51:13

标签: bash grep

在我bash下面loop通过一个目录,并在所有grep文件上运行.txt。我想要做的是在过滤结果中包含每个文件的标题行。目前,标题显示在`stdout'两个新的过滤文件结果没有标题。下面似乎很接近,但我似乎无法在输出中包含唯一的标题。谢谢你:)。

的bash

for file in /home/cmccabe/compare/*.txt ; do
 bname=$(basename $file)
 pref=${bname%%.txt}
 [ "$file" = /home/cmccabe/compare/${pref}_filtered.txt ] && continue
 head -n 1 "$file"
 grep -wFf /home/cmccabe/compare/list $file > /home/cmccabe/compare/${pref}_filtered.txt
done

文件1

Index   Chromosomal Position    Gene    
4   43394661    SLC2A1
22  166870221   SCN1A
22  166870952   CBS

file2的

Chrom   Position    Gene Symbol Target ID
chr22   40742831    ADSL    AMPL3764590328
chr22   40745898    ADSL    AMPL5177720331
chr5    125885803   ALDH7A1 AMPL4306766150
chr5    178555085   FBN1    AMPL4306766155

列表(用于grep

SLC2A1
SCN1A
ADSL
ALDH7A1

所需的file1_filtered输出

Index   Chromosomal Position    Gene
4   43394661    SLC2A1
22  166870221   SCN1A

所需的file2_filtered输出

Chrom   Position    Gene Symbol Target ID
chr22   40742831    ADSL    AMPL3764590328
chr22   40745898    ADSL    AMPL5177720331
chr5    125885803   ALDH7A1 AMPL4306766150

2 个答案:

答案 0 :(得分:2)

使用GNU grep和bash的进程替换:

grep -wf <(head -n 1 file1; cat list) file1

输出:

Index   Chromosomal Position    Gene    
4   43394661    SLC2A1
22  166870221   SCN1A
grep -wf <(head -n 1 file2; cat list) file2

输出:

Chrom   Position    Gene Symbol Target ID
chr22   40742831    ADSL    AMPL3764590328
chr22   40745898    ADSL    AMPL5177720331
chr5    125885803   ALDH7A1 AMPL4306766150

答案 1 :(得分:1)

你正在犯这个错误。阅读why-is-using-a-shell-loop-to-process-text-considered-bad-practice,然后执行此操作:

awk '
BEGIN { FS="\t" }
NR==FNR { genes[$0]; next }
FNR==1 {
    close(out)
    out = FILENAME
    sub(/\.txt$/,"_filtered&",out)
    for (i=1; i<=NF; i++) {
        if ( $i == "Gene" ) {
            g = i
        }
    }
}
(FNR==1) || ($g in genes) { print > out }
' /home/cmccabe/compare/*.txt

它比您目前正在做的更强大,更高效,更便携。