python:将t检验应用于DataFrame中的每一列

时间:2018-09-22 02:18:26

标签: python-3.x pandas dataframe t-test

我有一个约有100,000列的数据框。数据框的第一列是“标签”。每列的数据分为两组。一个是label == 1,另一个是label == 0。就像上面一样:enter image description here

我的目标是根据不同的“标签”对每列进行t检验。以下是我的代码:

import pandas as pd
from scipy import stats as ss

def t_test(filename):
    df = pd.read_csv(filename)

    column_list = [x for x in df.columns if x != 'labels']
    t_test_results = {}
    for column in column_list:
    non_essential = df.where('labels'==1).dropna()[column]
    essential = df.where('labels'==0).dropna()[column]
    t_test_results[column] = ss.ttest_rel(non_essential, essential)
    result_df = pd.DataFrame.from_dict(t_test_results, orient='Index')
    result_df.columns = ['statistic', 'pvalue']
    return result_df

if __name__ == '__main__':
    result = t_test('encoding_test_data.csv')
    with open('t_test_result.txt', 'w') as f:
        f.write(str(result))

'encoding_test_data.csv'是我的测试文件。我得到了错误信息:

Traceback (most recent call last):
File "E:/master_subject/t_test.py", line 22, in <module>
result = t_test('encoding_test_data.csv')
File "E:/master_subject/t_test.py", line 14, in t_test
non_essential = df.where('labels'==1).dropna()[column]
File "D:\Python37\lib\site-packages\pandas\core\generic.py", line 7772, in where
errors=errors, try_cast=try_cast)
File "D:\Python37\lib\site-packages\pandas\core\generic.py", line 7516, in _where
raise ValueError('Array conditional must be same shape as '
ValueError: Array conditional must be same shape as self 

我如何实现目标?

0 个答案:

没有答案