我正在尝试使用Target
变量创建相关表,但是目前我遇到了一个问题。
我只希望与变量Delay_Days
相关的前10个变量的列表
我正在获得十大相关变量
我的代码
def mosthighlycorrelated(mydataframe, numtoreport):
# find the correlations
cormatrix = data_clean.corr()
# set the correlations on the diagonal or lower triangle to zero,
# so they will not be reported as the highest ones:
cormatrix *= np.tri(*cormatrix.values.shape, k=-1).T
# find the top n correlations
cormatrix = cormatrix.stack()
cormatrix = cormatrix.reindex(cormatrix.abs().sort_values(ascending=False).index).reset_index()
# assign human-friendly names
cormatrix.columns = ["FirstVariable", "SecondVariable", "Correlation"]
return cormatrix.head(numtoreport)
mosthighlycorrelated(data_clean['Delay_Days'], 10)
这无法正常工作。如何解决这个问题。
**Current Ouput**
First Variable Second Variable Correlation
A B 0.9
C G 0.85
B D 0.7
A F 0.65
**Expected Output**
First Variable Second Variable Correlation
A B 0.9
A F 0.7
A C 0.3
A D 0.2
A E 0.1