考虑以下示例,我将遍历每一行,将它们分成两个样本,并对每一行执行统计检验:
for index, row in data.iterrows():
stat, p = mannwhitneyu(row.iloc[:self.neighbors], row.iloc[self.neighbors:], alternative = 'greater')
data.loc[index, 'stat'] = stat
data.loc[index, 'prob'] = p
有什么办法可以加快速度吗?我看到Apply或Vectorization可以改善它,但是我真的不知道如何针对我要实现的目标实现它,因为我需要逐行执行相同的测试。
谢谢您的帮助!
答案 0 :(得分:0)
如果指定axis=1
,则可以应用于行。在您的情况下,将是这样的:
def change_value(row):
stat, p = mannwhitneyu(row.iloc[:self.neighbors], row.iloc[self.neighbors:],
alternative = 'greater')
row['stat'] = stat
row['prob'] = p
#Assuming your dataframe is called df
df.apply(change_value, axis=1)