我想自动将下面的python代码应用于不同的数据帧。
df_twitter = pd.read_csv('merged_watsonTwitter.csv')
df_original = pd.read_csv('merged_watsonOriginal.csv')
sample_1_twitter = df_twitter['ID_A'] == "08b56ebc-8eae-41b3-9c86-c79e3be542fd"
sample_1_twitter = df_twitter[sample_1_twitter]
sample_1_original = df_original['ID_B'] == "08b56ebc-8eae-41b3-9c86-c79e3be542fd"
sample_1_original = df_original[sample_1_original]
sample_1_twit_trunc = sample_1_twitter[['raw_score_parent_A','raw_score_child_A']]
sample_1_ori_trunc = sample_1_original[['raw_score_parent_B','raw_score_child_B']]
sample_1_twit_trunc.reset_index(drop=True, inplace=True)
sample_1_ori_trunc.reset_index(drop=True, inplace=True)
sample_1 = pd.concat([sample_1_twit_trunc, sample_1_ori_trunc], axis=1)
sample_1['ID'] = '08b56ebc-8eae-41b3-9c86-c79e3be542fd'
stats.ttest_rel(sample_1['raw_score_child_B'], sample_1['raw_score_child_A'])
例如,表示ID“08b56ebc-8eae-41b3-9c86-c79e3be542fd”的上述代码属于特定个人。如果我要为我拥有的所有样本计算T检验,那么我需要通过复制并粘贴上面的代码来为每个人更换不同的ID。
是否有一种方法可以自动完成这些过程的这一过程;
df_twitter['ID_A'] == "08b56ebc-8eae-41b3-9c86-c79e3be542fd"
df_original['ID_B'] == "08b56ebc-8eae-41b3-9c86-c79e3be542fd"
sample_1['ID'] = '08b56ebc-8eae-41b3-9c86-c79e3be542fd'
可以接受我拥有的所有ID并自动完成整个过程。
最后,保存此功能生成的每个结果输出:
stats.ttest_rel(sample_1['raw_score_child_B'], sample_1['raw_score_child_A'])
答案 0 :(得分:0)
正如克劳斯所说,你需要一个带参数的函数。您可以尝试将代码放在函数中。您可能希望将ID存储在任何可迭代集合的列表中。您还可以将t检验结果存储在列表中。
ids = ["08b56ebc-8eae-41b3-9c86-c79e3be542fd","08b56ebc-8eae-41b3-9c86-c79e3be542f4"]
def runTTest (id,df_twitter,df_original):
sample_1_twitter = df_twitter['ID_A'] == id
sample_1_twitter = df_twitter[sample_1_twitter]
sample_1_original = df_original['ID_B'] == id
sample_1_original = df_original[sample_1_original]
sample_1_twit_trunc =
sample_1_twitter[['raw_score_parent_A','raw_score_child_A']]
sample_1_ori_trunc =
sample_1_original[['raw_score_parent_B','raw_score_child_B']]
sample_1_twit_trunc.reset_index(drop=True, inplace=True)
sample_1_ori_trunc.reset_index(drop=True, inplace=True)
sample_1 = pd.concat([sample_1_twit_trunc, sample_1_ori_trunc], axis=1)
sample_1['ID'] = id
return stats.ttest_rel(sample_1['raw_score_child_B'], sample_1['raw_score_child_A'])
t_test_results=[]
for id in ids:
t_test_results.append(runTTest(id,df_twitter ,df_original))