在我bash
下面loop
通过一个目录,并在所有grep
文件上运行.txt
。我想要做的是在过滤结果中包含每个文件的标题行。目前,标题显示在`stdout'两个新的过滤文件结果没有标题。下面似乎很接近,但我似乎无法在输出中包含唯一的标题。谢谢你:)。
的bash
for file in /home/cmccabe/compare/*.txt ; do
bname=$(basename $file)
pref=${bname%%.txt}
[ "$file" = /home/cmccabe/compare/${pref}_filtered.txt ] && continue
head -n 1 "$file"
grep -wFf /home/cmccabe/compare/list $file > /home/cmccabe/compare/${pref}_filtered.txt
done
文件1
Index Chromosomal Position Gene
4 43394661 SLC2A1
22 166870221 SCN1A
22 166870952 CBS
file2的
Chrom Position Gene Symbol Target ID
chr22 40742831 ADSL AMPL3764590328
chr22 40745898 ADSL AMPL5177720331
chr5 125885803 ALDH7A1 AMPL4306766150
chr5 178555085 FBN1 AMPL4306766155
列表(用于grep
)
SLC2A1
SCN1A
ADSL
ALDH7A1
所需的file1_filtered输出
Index Chromosomal Position Gene
4 43394661 SLC2A1
22 166870221 SCN1A
所需的file2_filtered输出
Chrom Position Gene Symbol Target ID
chr22 40742831 ADSL AMPL3764590328
chr22 40745898 ADSL AMPL5177720331
chr5 125885803 ALDH7A1 AMPL4306766150
答案 0 :(得分:2)
使用GNU grep和bash的进程替换:
grep -wf <(head -n 1 file1; cat list) file1
输出:
Index Chromosomal Position Gene 4 43394661 SLC2A1 22 166870221 SCN1A
grep -wf <(head -n 1 file2; cat list) file2
输出:
Chrom Position Gene Symbol Target ID chr22 40742831 ADSL AMPL3764590328 chr22 40745898 ADSL AMPL5177720331 chr5 125885803 ALDH7A1 AMPL4306766150
答案 1 :(得分:1)
你正在犯这个错误。阅读why-is-using-a-shell-loop-to-process-text-considered-bad-practice,然后执行此操作:
awk '
BEGIN { FS="\t" }
NR==FNR { genes[$0]; next }
FNR==1 {
close(out)
out = FILENAME
sub(/\.txt$/,"_filtered&",out)
for (i=1; i<=NF; i++) {
if ( $i == "Gene" ) {
g = i
}
}
}
(FNR==1) || ($g in genes) { print > out }
' /home/cmccabe/compare/*.txt
它比您目前正在做的更强大,更高效,更便携。