我试图了解awk并更改了我在此处找到的脚本:https://www.tecmint.com/learn-use-awk-special-patterns-begin-and-end/ 我想搜索多个文件的多个模式并计算它们。我在3 csv的文件夹中测试了它。文件,只有第一个包含模式。当模式被直接定义时,这很有效。
脚本:
#!/bin/bash
for file in $(ls *.csv); do
if [ -f $file ] ; then
#print out filename
echo "File is: $file"
#print the total number of times phn_phnM appears in the file
awk ' BEGIN { print "The number of times phn_phnM appears in the file is:" ; }
/phn/ { counterx+=1 ; }
/phnM/ { countery+=1 ; }
END { printf "%s\n", counterx ; }
END { printf "%s\n", countery ; }
' $file
else
#print error info incase input is not a file
echo "$file is not a file, please specify a file." >&2 && exit 1
fi
done
#terminate script with exit code 0 in case of successful execution
exit 0
输出:
bash ./unix_commands/count_genes.awk
File is: omics_collection.csv
The number of times phn_phnM appears in the file is:
970
84
File is: temp.csv
The number of times phn_phnM appears in the file is:
File is: temp2.csv
The number of times phn_phnM appears in the file is:
但是当我尝试包含变量时,脚本无法再执行 -
编辑:正如@Charles Duffy所指出的,这是由于awk变量和bash变量不一样的问题,我完全没有意识到这一点。我调整了我的脚本,让awk理解shell中设置的变量,现在它做了我想要的事情:#!/bin/bash
GENE1="NA"
GENE2="fadD"
for file in *.csv; do
if [ -f $file ] ; then
#print out filename
echo "File is: $file"
#print the total numbers of genes in the files
awk -v a="$GENE1" -v b="$GENE2" ' BEGIN { print "The number of times", a, "and", b " appear in the file are:" ; }
$0 ~ a { a_counter+=1 ; }
END { print a, a_counter ; }
$0 ~ b { b_counter+=1 ; }
END { print b, b_counter ; }
' $file
else
#print error info incase input is not a file
echo "$file is not a file, please specify a file." >&2 && exit 1
fi
done
#terminate script with exit code 0 in case of successful execution
exit 0
我将不得不研究这个"动态"但是,搜索模式的东西是为了理解我在那里做了什么。但我明白变量扩展不起作用,因此/ a / as模式实际上是在查找我文件中的数量。 我也不得不替换
END { printf "%s\n", a, a_counter ; }
与
END { print, a, a_counter ; }
因为printf只会打印" a"的值,而不会打印" a_counter"我无法弄清楚原因。 我认为" a_counter"在awk里面不会被识别为$(GENE1)_counter?