我正在尝试计算pandas数据帧的所有列之间的互信息分数
from sklearn.metrics.cluster import adjusted_mutual_info_score
from itertools import combinations
current_valid_columns = list(train.columns.difference(["ID"]))
MI_scores = pd.DataFrame(columns=["features_pair","adjusted_mutual_information"])
current_index = 0
for columns_pair in combinations(current_valid_columns, 2):
row = pd.Series([str(columns_pair),adjusted_mutual_info_score(train[columns_pair[0]],train[columns_pair[1]])])
MI_scores.loc[current_index] = row.values
current_index +=1
MI_scores.to_csv("adjusted_mutual_information_score.csv", sep="|", index=False)
这可行,但在具有大量列的数据框架上速度非常慢。我该如何优化它?