我尝试使用下面的awk
来使输出看起来像所需的输出并且在语法上有些麻烦。我似乎遇到问题的部分是在特定目标$3
中使用这些基数$1
少于30次读取,输出#并计算平均值。谢谢你:)。
awk '
{N[$1]++
T[$1]+=$4
M[$1]=$2
}
END {for (X in N) printf ("%s is %d bases and maps to %s with an average depth"\
" of %f reads\n", X, N[X], M[X], T[X]/N[X]);
}
' input.txt > output.txt
输入
chr1:955542-955763 AGRN:exon.1 1 0
chr1:955542-955763 AGRN:exon.1 2 0
chr1:955542-955763 AGRN:exon.1 3 0
chr1:955542-955763 AGRN:exon.1 4 1
chr1:955542-955763 AGRN:exon.1 5 1
chr1:955542-955763 AGRN:exon.1 6 1
chr1:955542-955763 AGRN:exon.1 7 1
chr1:955542-955763 AGRN:exon.1 8 1
chr1:955542-955763 AGRN:exon.1 9 1
chr1:955542-955763 AGRN:exon.1 10 1
chr1:955542-955763 AGRN:exon.1 11 32
当前输出
chr1:955542-955763 is 11 bases and maps to AGRN:exon.1 with an average depth of 3.545455 reads
所需的输出
chr1:955542-955763 is 11 bases and maps to AGRN:exon.1 with an average depth of 3.54 reads and there are 10 bases less than 30 reads with an average coverage of 0.63 reads
编辑(字段说明)
awk '{for (i=1; i<=NF; i++) print i, $i}' input.txt
1 chr1:955542-955763 (defines the specific target location) - variable N
2 AGRN:exon.1 (defines the name/id of the target location) - variable M
3 1 (defines the exact base on the target)
4 0 (used to calculate the average) - variable T
输出的第一部分似乎完美无缺,它只是添加到那个尝试获得第二部分。基本上是and there are 10 bases less than 30 reads with an average coverage of 0.63 reads
其中10
是$2
中最后一个基数少于30次的基数。 0.63
是$4
中所有#的平均值。我希望这有帮助,谢谢你:)。
2-D输出
Lo: chr1:955542-955763 is 10 bases and maps to AGRN:exon.1 with an average depth of 0.700000 reads
Hi: chr1:955542-955763 is 1 bases and maps to AGRN:exon.1 with an average depth of **2.909091** reads ( should be 32 - `$4` is 32 / 1)
答案 0 :(得分:1)
更新了答案
对于阈值2-D类型的输出,对于2-D数组,我将恢复为GNU awk
:
gawk '
{ i=1 # use second index of 1 for $4 < 30
if($4>=30)i=2 # use second index of 2 for $4 >= 30
N[$1][i]++
T[$1][i]+=$4
B[$1][i]++
M[$1][i]=$2
}
END {
for (X in N){
printf ("Lo: %s is %d bases and maps to %s with an average depth"\
" of %f reads\n", X, N[X][1], M[X][1], T[X][1]/B[X][1]);
printf ("Hi: %s is %d bases and maps to %s with an average depth"\
" of %f reads\n", X, N[X][2], M[X][2], T[X][2]/B[X][2]);
}
} ' input.txt
<强>输出强>
Lo: chr1:955542-955763 is 10 bases and maps to AGRN:exon.1 with an average depth of 0.700000 reads
Hi: chr1:955542-955763 is 1 bases and maps to AGRN:exon.1 with an average depth of 32.000000 reads
原始答案
我认为你想要这样的东西,它会忽略最后一个字段为30或更多的行:
awk '
$4 < 30 {
N[$1]++
T[$1]+=$4
B[$1]=$3
M[$1]=$2
}
END {
for (X in N) printf ("%s is %d bases and maps to %s with an average depth"\
" of %f reads\n", X, N[X], M[X], T[X]/B[X]);
} ' input.txt