我正在尝试从一个大文件中进行查询。我在bash脚本中使用“awk”。 bash脚本从参数文件中读取一些参数(逐行),并将它们放入变量中,然后传递给awk。每个查询的结果需要存储在参数文件中指定的单独文件中:
#!/bin/bash
while IFS=\t read chr start end name
do
echo $chr $start $end $name
awk -v "chr=$chr" -v "start=$start" -v "end=$end" '$1==chr && $3>start && $3<end && $11<5E-2 {print $0}' bigfile.out > ${name}.out
done < parameterfile
不幸的是,awk命令不会产生任何输出。任何想法可能是错的。 (基于echo命令bash变量被正确分配)。
答案 0 :(得分:1)
IMHO Bash不理解IFS中的“\ t”。试试这个
while IFS=$(echo -e "\t") read chr start end name
do
echo =$chr=$start=$end=$name=
done <<EOF
11 1 10 aaa bbb
12 3 30 ccc bbb
EOF
这个将分解制表符分隔的文本。您的变体会将所有内容分配到$chr
。每次打印带有可见分隔符的变量赋值。 :)'='例如。
答案 1 :(得分:1)
关键是在IFS:
while IFS=' ' read chr start end name
单引号之间的是tab char。
答案 2 :(得分:0)
我不知道在两者之间进行bash的具体要求是什么, 但是,如果要求从文件/用户读取输入,那么这应该起作用
#!/bin/bash
cat parameterfile |awk 'BEGIN{
FS="\t";
}{
# If parameterfile has multiple lines, and you want to comment in them, prahaps
# if($0~"^[ \t]*#")next;
# Will allow lines starting with # (with any amount of space or tab in the front) to be reconized
# as comments instead of parameters :-)
#
# read the parameter file, whatever format it may be.
# Here we assume parameterfile is tab separated, so inside the BEGIN{} we specify FS as tab
# if it is a cvs , then A[0]=split($0,A,","); and then chr=A[1]; as such.
chr=$1;
start=$2;
end=$3;
name=$4;
# Lets start reading the file. We could read this from parameter file, if you want, or a -v var=arg on awk
file_to_read_from="bigfile.out";
while((getline line_of_data < file_to_read_from)>0){
# Since I do not have psychic powers to guess the format of the input of the file, here is some example
# If it is separated my more than one space
# B[0]=split(line_of_data,B,"[ ]");
# If it is separated by tabs
B[0]=split(line_of_data,B,"\t");
# Check if the line matches our specified whatever condition
if( B[1]==chr && B[3]>start && B[3]<end && B[11]<5E-2 ){
# Print to whatever destination
print > name".out";
}
}
# Done reading all lines from file_to_read_from
# Close opened file, so that we can handle millions of files
close(file_to_read_from);
# If parameterfile has multiple lines, then more is processed.
# If you only want the first line of parameter file to be read, then
# exit 0;
# should get you out of here
}'