我的原始观察结果如下:
name Analyte spring 0.1 winter 0.4
为了计算p值,我做了自举模拟:
name Analyte spring 0.001 winter 0 spring 0 winter 0.2 spring 0.03 winter 0 spring 0.01 winter 0.02 spring 0.1 winter 0.5 spring 0 winter 0.04 spring 0.2 winter 0 spring 0 winter 0.06 spring 0 winter 0 .....
现在我想计算经验p值:在原始数据中冬季分析物= 0.4 - 如果在自举数据中冬季分析物> = 0.4(例如1次)并且完成自举(例如100次),那么计算冬季分析物的经验p值:
1/100 = 0.01
(数据与原始数据相同或更高的次数 除以观察总数) 对于弹簧分析物,p值为:
2/100 = 0.02
我想用awk计算那些p值。 我对春天的解决方案是:
awk -v VAR="spring" '($1==VAR && $2>=0.1) {n++} END {print VAR,"p-value=",n/100}'
春天p值= 0.02 我需要的帮助是将原始文件(名称为spring and winter及其分析物,观察和观察次数)传递给awk并分配。
答案 0 :(得分:4)
awk -f script.awk original bootstrap
# Slurp the original file in an array a
# Ignore the header
NR==FNR && NR>1 {
# Index of this array will be type
# Value of that type will be original value
a[$1]=$2
next
}
# If in the bootstrap file value
# of second column is greater than original value
FNR>1 && $2>a[$1] {
# Increment an array indexed at first column
# which is nothing but type
b[$1]++
}
# Increment another array regardless to identify
# the number of times bootstrapping was done
{
c[$1]++
}
# for each type in array a
END {
for (type in a) {
# print the type and calculate empirical p-value
# which is done by dividing the number of times higher value
# of a type was seen and total number of times
# bootstrapping was done.
print type, b[type]/c[type]
}
}
$ cat original
name Analyte
spring 0.1
winter 0.4
$ cat bootstrap
name Analyte
spring 0.001
winter 0
spring 0
winter 0.2
spring 0.03
winter 0
spring 0.01
winter 0.02
spring 0.1
winter 0.5
spring 0
winter 0.04
spring 0.2
winter 0
spring 0
winter 0.06
spring 0
winter 0
$ awk -f s.awk original bootstrap
spring 0.111111
winter 0.111111
Spring Original Value is 0.1
Winter Original Value is 0.4
Bootstrapping done is 9 times for this sample file
Count of values higher than Spring original value = 1
Count of values higher than Winter's original value = 1
So, 1/9 = 0.111111
答案 1 :(得分:2)
FNR == NR {
a[$1] = $2
next
}
$2 > a[$1] {
b[$1]++
}
{
c[$1]++
}
END {
for (i in a) print i, "p-value=",b[i]/c[i]
}
..输出是:
winter p-value= 0.111111
spring p-value= 0.111111