在熊猫数据框列上应用功能

时间:2018-10-05 19:38:16

标签: python pandas

这似乎也得到了类似的回答,但我无法使其正常工作。

我有一个熊猫数据框,看起来像下面的sig_vars。此df有一个VAF和一个Background列。我想使用statsmodels中的ztest函数将p值分配给新的p-value列。

每行的p值计算如下:

from statsmodels.stats.weightstats import ztest
p_value = ztest(sig_vars.Background,value=sig_vars.VAF)[1]

我已经尝试过类似的方法,但是我无法完全起作用:

def calc(x):
    return ztest(x.Background, value=x.VAF.astype(float))[1]

sig_vars.dropna().assign(pval = lambda x: calc(x)).head()

对我来说,奇怪的是,它工作得很好:

def calc(x):
    return ztest([0.0001,0.0002,0.0001], value=x.VAF.astype(float))[1]

sig_vars.dropna().assign(pval = lambda x: calc(x)).head()

这是我的数据框sig_vars

sig_vars = pd.DataFrame({'AO': {0: 4.0, 1: 16.0, 2: 12.0, 3: 19.0, 4: 2.0},
 'Background': {0: nan,
  1: [0.00018832391713747646, 0.0002114408734430263, 0.000247843759294141],
  2: nan,
  3: [0.00023965141612200435,
   0.00018864365214110544,
   0.00036566589684372596,
   0.0005452562704471102],
  4: [0.00017349063150589867]},
 'Change': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'},
 'Chrom': {0: 'chr1', 1: 'chr1', 2: 'chr1', 3: 'chr1', 4: 'chr1'},
 'ConvChange': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'},
 'DP': {0: 16945.0, 1: 16945.0, 2: 16969.0, 3: 16969.0, 4: 16969.0},
 'Downstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'},
 'Gene': {0: 'TIIIa', 1: 'TIIIa', 2: 'TIIIa', 3: 'TIIIa', 4: 'TIIIa'},
 'ID': {0: '86.fastq/onlyProbedRegions.vcf',
  1: '86.fastq/onlyProbedRegions.vcf',
  2: '86.fastq/onlyProbedRegions.vcf',
  3: '86.fastq/onlyProbedRegions.vcf',
  4: '86.fastq/onlyProbedRegions.vcf'},
 'Individual': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'IntEx': {0: 'TIII', 1: 'TIII', 2: 'TIII', 3: 'TIII', 4: 'TIII'},
 'Loc': {0: 115227854, 1: 115227854, 2: 115227855, 3: 115227855, 4: 115227856},
 'Upstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'},
 'VAF': {0: 0.00023605783416937148,
  1: 0.0009442313366774859,
  2: 0.0007071719017031057,
  3: 0.0011196888443632507,
  4: 0.00011786198361718427},
 'Var': {0: 'A', 1: 'C', 2: 'A', 3: 'C', 4: 'A'},
 'WT': {0: 'T', 1: 'T', 2: 'T', 3: 'T', 4: 'C'}})

1 个答案:

答案 0 :(得分:1)

尝试一下:

def calc(x):
    return ztest(x['Background'], value=float(x['VAF']))[1]

sig_vars['pval'] = sig_vars.dropna().apply(calc, axis=1)