Scipy Ttest_ind和Ranksum

时间:2018-01-24 14:36:42

标签: python pandas numpy

我正在尝试进行两次样本t检验。我的数据集由744行和186列组成,我已经计算了总和和平均值。我需要进行两次样本t检验。我的csv看起来像这样我必须为每行计算ttest和秩和测试,因为单独的行表示单独的ID并具有相应的值:

SRA ID  ERR169499            ERR169498           ERR169497
Label   1                    0                   1
TaxID   PRJEB3251_ERR169499  PRJEB3251_ERR169499 PRJEB3251_ERR169499
333046  0.05                 0.99                99.61
1049    0.03                 2.34                34.33
337090  0.01                 9.78                23.22
99007   22.33                2.90                0.00

标签0和1分别用于大小写和控件。我必须为case列和ctrl列计算ttest。

df = pd.read_csv('final_out_transposed.csv')
for row in df.iterrows():
  (tt_val, p_ttest) = ttest_ind(df.sum_case, df.sum_ctrl)
  (tr_val, p_ranksum) = ranksums(df.sum_case, df.sum_ctrl)
  print (tt_val)
  print (p_ttest)
  print (tr_val)
  print (p_ranksum)

请帮助我。

1 个答案:

答案 0 :(得分:0)

我认为这就是你要找的东西:

# assuming this data is coming from 'sum.csv'
'''
TaxID    sum_case   sum_ctrl  mean_case  mean_ctrl  n_case  n_ctrl
333046   4.76       4.56      xx.xx      xx.xx      xx      xx
1049     45.21      33.22     xx.xx      xx.xx      xx      xx
337090   35.98      16.71     xx.xx      xx.xx      xx      xx  
'''

import pandas as pd
from scipy.stats import ttest_ind

# read in data from 'sum.csv'
df = pd.read_csv('sum.csv')

df.head()

df

# run ttest using 'sum_ctrl' and 'sum_case' from 'sum.csv' data
ttest_ind(df.sum_ctrl, df.sum_case)

results