我有一个约有100,000列的数据框。数据框的第一列是“标签”。每列的数据分为两组。一个是label == 1,另一个是label == 0。就像上面一样:enter image description here
我的目标是根据不同的“标签”对每列进行t检验。以下是我的代码:
import pandas as pd
from scipy import stats as ss
def t_test(filename):
df = pd.read_csv(filename)
column_list = [x for x in df.columns if x != 'labels']
t_test_results = {}
for column in column_list:
non_essential = df.where('labels'==1).dropna()[column]
essential = df.where('labels'==0).dropna()[column]
t_test_results[column] = ss.ttest_rel(non_essential, essential)
result_df = pd.DataFrame.from_dict(t_test_results, orient='Index')
result_df.columns = ['statistic', 'pvalue']
return result_df
if __name__ == '__main__':
result = t_test('encoding_test_data.csv')
with open('t_test_result.txt', 'w') as f:
f.write(str(result))
'encoding_test_data.csv'是我的测试文件。我得到了错误信息:
Traceback (most recent call last):
File "E:/master_subject/t_test.py", line 22, in <module>
result = t_test('encoding_test_data.csv')
File "E:/master_subject/t_test.py", line 14, in t_test
non_essential = df.where('labels'==1).dropna()[column]
File "D:\Python37\lib\site-packages\pandas\core\generic.py", line 7772, in where
errors=errors, try_cast=try_cast)
File "D:\Python37\lib\site-packages\pandas\core\generic.py", line 7516, in _where
raise ValueError('Array conditional must be same shape as '
ValueError: Array conditional must be same shape as self
我如何实现目标?