我有一个在相关矩阵相关矩阵中看起来像这样的数据集
如何显示3个最佳正相关而不是我所有的相关性。+相关应以数字显示)
答案 0 :(得分:0)
您可以屏蔽所有不感兴趣的值,如下所示。
# Set the diagonal to -np.inf
corr[np.diag_indices_from(corr)] = -np.inf
# Find the value of the k-largest correlation
k = 3
threshold = np.sort(corr)[-k]
# Mask all values that are below the threshold
corr[corr < threshold] = np.nan
# Do your plotting as before
答案 1 :(得分:0)
演示:
In [156]: df = pd.DataFrame(np.random.randint(1, 6, size=(5, 5))).add_prefix('col').corr()
In [157]: df
Out[157]:
col0 col1 col2 col3 col4
col0 1.000000 0.000000 0.060193 -0.722222 -0.218218
col1 0.000000 1.000000 -0.233126 -0.215166 0.845154
col2 0.060193 -0.233126 1.000000 0.541736 0.118217
col3 -0.722222 -0.215166 0.541736 1.000000 0.036370
col4 -0.218218 0.845154 0.118217 0.036370 1.000000
In [158]: corr = df.values
In [159]: corr[np.tril_indices_from(corr)] = np.nan
In [160]: x = pd.DataFrame(corr, columns=df.columns, index=df.index)
In [161]: x.stack(dropna=False).nlargest(3).unstack()
Out[161]:
col3 col4
col1 NaN 0.845154
col2 0.541736 0.118217
In [162]: sns.heatmap(x.stack(dropna=False).nlargest(3).unstack())
Out[162]: <matplotlib.axes._subplots.AxesSubplot at 0xcacf7b8>