两个样本依赖T检验在python和秩和检验

时间:2018-01-19 12:44:34

标签: python-3.x pandas numpy scipy t-test

我的数据集有两个标签,标签1 = 0(大小写),标签2 = 1(控制)。我已经计算了两种不同标签的平均值。此外,我需要计算两个样本t检验(依赖)和两个样本秩和检验。我的数据集如下:

SRA ID  ERR169499           ERR169500           ERR169501           mean_ctrl   mean_case
Label   1                   0                   1
TaxID   PRJEB3251_ERR169499 PRJEB3251_ERR169500 PRJEB3251_ERR169501
333046  0.05                0                   0.4
1049    0.03                0.9                 0
337090  0.01                0.6                 0.7

我是统计数据的新手。到目前为止,我的代码是:

label = []
data = {}

x = open('final_out_transposed.csv','rt')
for r in x:
datas = r.split(',')
if datas[0] == ' Label':
    label.append(r.split(",")[1:])
label = label[0]
label[-1] = label[-1].replace('\n','')
counter = len(label)

for row in file1:
content = row.split(',')
if content[0]=='SRA ID' or content[0]== 'TaxID' or content[0]==' Label':
    pass
else:
    dt = row.split(',')
    dt[-1] = dt[-1].replace('\n','')
    data[dt[0]]=dt[1:]
keys = list(data)

sum_file = open('sum.csv','w')
for key in keys:
sum_case = 0
sum_ctrl = 0
count_case = 0
count_ctrl = 0
mean_case = 0
mean_ctrl = 0
print(len(label))
for i in range(counter):
    print(i)
    if label[i] == '0' or label[i] == 0:
        sum_case=np.float64(sum_case)+np.float64(data[key][i])
        count_case = count_case+1
        mean_case = sum_case/count_case
    else:
        sum_ctrl = np.float64(sum_ctrl)+np.float64(data[key][i])
        count_ctrl = count_ctrl+1
        mean_ctrl = sum_ctrl/count_ctrl

任何帮助都将受到高度赞赏。

1 个答案:

答案 0 :(得分:1)

我会使用Pandas而不是使用open来读取你的csv文件。这将把它放在一个更容易使用的数据框中

import pandas as pd
data_frame = pd.read_csv('final_out_transposed.csv')

对于双样本相关T检验,您要使用ttest_rel

通知 ttest_ind适用于独立群组。由于您专门要求依赖组,请使用ttest_rel。

上面的例子很难看到你的两列样本数据在哪里,但想象我有以下由'case'和'control'组成的数据。我可以使用pandas计算一个依赖的Two Sample t检验,如下所示:

import pandas as pd
from scipy.stats import ttest_rel
data_frame = pd.DataFrame({
'case':[55, 43, 51, 62, 35, 48, 58, 45, 48, 54, 56, 32],
'control':[48, 38, 53, 58, 36, 42, 55, 40, 49, 50, 58, 25]})
(t_stat, p) = ttest_rel(data_frame['control'], data_frame['case'])
print (t_stat)
print (p)

p将是p值,t_stat将是t统计量。您可以在documentation

中详细了解相关信息

以类似的方式,一旦在数据帧中获得了样本.csv数据,就可以执行秩和检验:

from scipy.stats import ranksums
(t_stat, p) = ranksums(data_frame['control'], data_frame['case'])

documentation for ranksums