优化成对互信息得分

时间:2016-03-27 14:21:23

标签: python pandas itertools

我正在尝试计算pandas数据帧的所有列之间的互信息分数

from sklearn.metrics.cluster import adjusted_mutual_info_score
from itertools import combinations 

current_valid_columns = list(train.columns.difference(["ID"]))    

MI_scores = pd.DataFrame(columns=["features_pair","adjusted_mutual_information"])

current_index = 0 
for columns_pair in combinations(current_valid_columns, 2):
    row = pd.Series([str(columns_pair),adjusted_mutual_info_score(train[columns_pair[0]],train[columns_pair[1]])])
    MI_scores.loc[current_index] = row.values 
    current_index +=1 
MI_scores.to_csv("adjusted_mutual_information_score.csv", sep="|", index=False)

这可行,但在具有大量列的数据框架上速度非常慢。我该如何优化它?

0 个答案:

没有答案