我想灵活地将两个小awk的输出打印到bash管道,这些管道使用变量(它们最初工作)。我最初认为我可以将整个命令存储为变量本身,但对于一个它不起作用,显然(store awk command in a variable of bash script)这不是一个好主意。所以我写了两个函数,但是我在“完成”附近得到一个“意外的令牌”,但它的格式如上面的链接。
我的错误在哪里?
for coverage_file in */*.cov
do
#gene_count=$(awk '{print $5}' $coverage_file |sort | uniq -c | wc -l) #this is apparently not a good idea
#contig_count=$(awk '{print $1}' $coverage_file |sort | uniq -c | wc -l) #this is apparently not a good idea
cmd_gene() { awk '{print $5}' $coverage_file |sort | uniq -c | wc -l }
cmd_contig() { awk '{print $1}' $coverage_file |sort | uniq -c | wc -l }
cmd_gene $coverage_file
cmd_contig $coverage_file
#print "we found", $gene_count, "genes on ",$contig_count" contigs
done
cov文件如下所示:
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_88684 267 5 B10 phnM_032
k141_88684 268 5 B10 phnM_032
k141_88684 269 5 B10 phnM_032
k141_88684 270 5 B10 phnM_032
k141_88684 271 5 B10 phnM_032
k141_88684 272 5 B10 phnM_032
编辑:这包括已接受的答案+明确打印的可能方式:
#!/bin/bash
#define variables
gene="phnM"
threshold="5"
#define functions
cmd_gene() { awk '{print $5}' $1 |sort | uniq -c | wc -l ; } #semicolon is important here!
cmd_contig() { awk '{print $1}' $1 |sort | uniq -c | wc -l ; } #semicolon is important here!
#loop over files and print results (would be prettier with printf)
for coverage_file in */*.cov
do
echo $gene" was found" $(cmd_gene "$coverage_file") "times on" $(cmd_contig "$coverage_file")" contigs with minimum coverage of" $threshold in $coverage_file
done
输出:
phnM was found 67 times on 65 contigs with minimum coverage of 5 in phnm/test.cov
phnM was found 3 times on 2 contigs with minimum coverage of 5 in test/test.cov
答案 0 :(得分:2)
意外的令牌错误即将发生,因为当你定义一个函数时,}必须在它自己的行上或前面有;。
此外,由于您在功能定义中使用$coverage_file
,因此您无需通过该功能。
for coverage_file in */*.cov
do
cmd_gene() { awk '{print $5}' $coverage_file |sort | uniq -c | wc -l; }
cmd_contig() { awk '{print $1}' $coverage_file |sort | uniq -c | wc -l; }
cmd_gene
cmd_contig
#print "we found", $gene_count, "genes on ",$contig_count" contigs
done
如果你想定义for循环之外的函数,你可以使用$1
(不要与awk' s $ 1混淆)并像之前那样传递$coverage_file
。< / p>
编辑:以上示例
$ cat a.sh
cmd_gene() { awk '{print $5}' $1 |sort | uniq -c | wc -l; }
cmd_contig() { awk '{print $1}' $1 |sort | uniq -c | wc -l; }
for coverage_file in */*.cov
do
cmd_gene $coverage_file
cmd_contig $coverage_file
done
$ ls */*.cov
bf/a.cov
$ cat */*.cov
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_88684 267 5 B10 phnM_032
k141_88684 268 5 B10 phnM_032
k141_88684 269 5 B10 phnM_032
k141_88684 270 5 B10 phnM_032
k141_88684 271 5 B10 phnM_032
k141_88684 272 5 B10 phnM_032
$ sh a.sh
2
2
答案 1 :(得分:1)
@jas回答了你的问题,所以坚持下去,以下只是一个更好的方法来做你想做的事情,它太大/格式化不适合评论:
awk '
BEGIN {
gene = "phnM"
threshold = "5"
}
{
genes[$5]
contigs[$1]
}
ENDFILE {
printf "%s was found %d times on %d contigs with minimum coverage of %d in %s\n",
gene, length(genes), length(contigs), threshold, FILENAME
delete genes
delete contigs
}
' */*.cov
以上使用GNU awk作为ENDFILE,但如果有必要,它可以使其适用于其他awk:
awk '
BEGIN {
gene = "phnM"
threshold = "5"
}
FNR==1 { prt() }
{
genes[$5]
contigs[$1]
}
END { prt() }
function prt() {
if (fname != "") {
printf "%s was found %d times on %d contigs with minimum coverage of %d in %s\n",
gene, length(genes), length(contigs), threshold, fname
delete genes
delete contigs
}
fname = FILENAME
}
' */*.cov
请参阅https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice了解一些在操作文本时避免shell循环的原因。