我正在尝试进行两次样本t检验。我的数据集由744行和186列组成,我已经计算了总和和平均值。我需要进行两次样本t检验。我的csv看起来像这样我必须为每行计算ttest和秩和测试,因为单独的行表示单独的ID并具有相应的值:
SRA ID ERR169499 ERR169498 ERR169497
Label 1 0 1
TaxID PRJEB3251_ERR169499 PRJEB3251_ERR169499 PRJEB3251_ERR169499
333046 0.05 0.99 99.61
1049 0.03 2.34 34.33
337090 0.01 9.78 23.22
99007 22.33 2.90 0.00
标签0和1分别用于大小写和控件。我必须为case列和ctrl列计算ttest。
df = pd.read_csv('final_out_transposed.csv')
for row in df.iterrows():
(tt_val, p_ttest) = ttest_ind(df.sum_case, df.sum_ctrl)
(tr_val, p_ranksum) = ranksums(df.sum_case, df.sum_ctrl)
print (tt_val)
print (p_ttest)
print (tr_val)
print (p_ranksum)
请帮助我。
答案 0 :(得分:0)
我认为这就是你要找的东西:
# assuming this data is coming from 'sum.csv'
'''
TaxID sum_case sum_ctrl mean_case mean_ctrl n_case n_ctrl
333046 4.76 4.56 xx.xx xx.xx xx xx
1049 45.21 33.22 xx.xx xx.xx xx xx
337090 35.98 16.71 xx.xx xx.xx xx xx
'''
import pandas as pd
from scipy.stats import ttest_ind
# read in data from 'sum.csv'
df = pd.read_csv('sum.csv')
df.head()
# run ttest using 'sum_ctrl' and 'sum_case' from 'sum.csv' data
ttest_ind(df.sum_ctrl, df.sum_case)