我有这张表(NumSucc =成功次数,NumberTrials =试验次数,Prob是成功概率):
Gene NumSucc NumTrials Prob
Gene1 16 26 0.9548
Gene2 16 26 0.9548
Gene3 12 21 0.9548
Gene4 17 27 0.9548
Gene5 17 27 0.9548
Gene6 17 27 0.9548
Gene7 8 15 0.9548
Gene8 10 17 0.9548
我想要每行的累积二项分布P值。当我把这个确切的表放入excel列A-D,然后在E列中输入函数(例如第2行):
=BINOMDIST(B2,C2,D2,1)
输出表如下所示:
Gene NumSucc NumTrials Prob Binomial
Gene1 16 26 0.9548 9.68009E-08
Gene2 16 26 0.9548 9.68009E-08
Gene3 12 21 0.9548 1.40794E-07
Gene4 17 27 0.9548 1.47463E-07
Gene5 17 27 0.9548 1.47463E-07
Gene6 17 27 0.9548 1.47463E-07
Gene7 8 15 0.9548 1.79741E-06
Gene8 10 17 0.9548 5.01334E-06
或者,当我使用以下代码将这个确切的表放入Scipy时:
import glob
import os
import scipy
from scipy.stats.distributions import binom
import sys
def WriteBinomial(InputFile,output):
open_input_file = open(InputFile, 'r').readlines()[1:]
for line in open_input_file:
line = line.strip().split()
GeneName,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3])
print GeneName + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str((binom.cdf(num_succ-1, num_trials, prob)))
WriteBinomial(sys.argv[1],sys.argv[2])
输出结果为:
GeneName NumSucc NumTrials Prob Binomial
Gene1 16 26 0.9548 6.59829603211e-09
Gene2 16 26 0.9548 6.59829603211e-09
Gene3 12 21 0.9548 7.92014917046e-09
Gene4 17 27 0.9548 1.06754559723e-08
Gene5 17 27 0.9548 1.06754559723e-08
Gene6 17 27 0.9548 1.06754559723e-08
Gene7 8 15 0.9548 8.41770305586e-08
Gene8 10 17 0.9548 2.93060582331e-07
有谁知道为什么这两种方法都没有给出相同的结果?
答案 0 :(得分:0)
你的Python代码有" num_succ-1"而你的Excel公式没有在" B2-1"。
Python - > " binom.cdf(num_succ-1,num_trials,prob)" Excel - > " = BINOMDIST(B2,C2,D2,1)"
下面的代码应该产生与excel相同的输出。
import glob
import os
import scipy
from scipy.stats.distributions import binom
import sys
def WriteBinomial(InputFile,output):
open_input_file = open(InputFile, 'r').readlines()[1:]
for line in open_input_file:
line = line.strip().split()
GeneName,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3])
print GeneName + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str((binom.cdf(num_succ, num_trials, prob)))
WriteBinomial(sys.argv[1],sys.argv[2])