这似乎也得到了类似的回答,但我无法使其正常工作。
我有一个熊猫数据框,看起来像下面的sig_vars
。此df有一个VAF
和一个Background
列。我想使用statsmodels中的ztest
函数将p值分配给新的p-value
列。
每行的p值计算如下:
from statsmodels.stats.weightstats import ztest
p_value = ztest(sig_vars.Background,value=sig_vars.VAF)[1]
我已经尝试过类似的方法,但是我无法完全起作用:
def calc(x):
return ztest(x.Background, value=x.VAF.astype(float))[1]
sig_vars.dropna().assign(pval = lambda x: calc(x)).head()
对我来说,奇怪的是,它工作得很好:
def calc(x):
return ztest([0.0001,0.0002,0.0001], value=x.VAF.astype(float))[1]
sig_vars.dropna().assign(pval = lambda x: calc(x)).head()
这是我的数据框sig_vars
:
sig_vars = pd.DataFrame({'AO': {0: 4.0, 1: 16.0, 2: 12.0, 3: 19.0, 4: 2.0},
'Background': {0: nan,
1: [0.00018832391713747646, 0.0002114408734430263, 0.000247843759294141],
2: nan,
3: [0.00023965141612200435,
0.00018864365214110544,
0.00036566589684372596,
0.0005452562704471102],
4: [0.00017349063150589867]},
'Change': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'},
'Chrom': {0: 'chr1', 1: 'chr1', 2: 'chr1', 3: 'chr1', 4: 'chr1'},
'ConvChange': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'},
'DP': {0: 16945.0, 1: 16945.0, 2: 16969.0, 3: 16969.0, 4: 16969.0},
'Downstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'},
'Gene': {0: 'TIIIa', 1: 'TIIIa', 2: 'TIIIa', 3: 'TIIIa', 4: 'TIIIa'},
'ID': {0: '86.fastq/onlyProbedRegions.vcf',
1: '86.fastq/onlyProbedRegions.vcf',
2: '86.fastq/onlyProbedRegions.vcf',
3: '86.fastq/onlyProbedRegions.vcf',
4: '86.fastq/onlyProbedRegions.vcf'},
'Individual': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'IntEx': {0: 'TIII', 1: 'TIII', 2: 'TIII', 3: 'TIII', 4: 'TIII'},
'Loc': {0: 115227854, 1: 115227854, 2: 115227855, 3: 115227855, 4: 115227856},
'Upstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'},
'VAF': {0: 0.00023605783416937148,
1: 0.0009442313366774859,
2: 0.0007071719017031057,
3: 0.0011196888443632507,
4: 0.00011786198361718427},
'Var': {0: 'A', 1: 'C', 2: 'A', 3: 'C', 4: 'A'},
'WT': {0: 'T', 1: 'T', 2: 'T', 3: 'T', 4: 'C'}})
答案 0 :(得分:1)
尝试一下:
def calc(x):
return ztest(x['Background'], value=float(x['VAF']))[1]
sig_vars['pval'] = sig_vars.dropna().apply(calc, axis=1)