我有这样的数据框df_corr
A2M.AX ABC.AX AGL.AX AHY.AX ALL.AX AMC.AX AMP.AX
A2M.AX 1.000000 -0.505433 0.687367 0.223044 -0.664764 -0.199477
ABC.AX -0.505433 1.000000 -0.801770 -0.606418 0.860923 0.332359
AGL.AX 0.687367 -0.801770 1.000000 0.394378 -0.917379 -0.193461
AHY.AX 0.223044 -0.606418 0.394378 1.000000 -0.483766 -0.063892
ALL.AX -0.664764 0.860923 -0.917379 -0.483766 1.000000 0.177633
我想根据值获取索引和列名 这是我的尝试:
df_corr[(df_corr>0.7)&(df_corr<1)]
A2M.AX ABC.AX AGL.AX AHY.AX ALL.AX AMC.AX AMP.AX
ABC.AX NaN NaN NaN NaN 0.860923 NaN
AGL.AX NaN NaN NaN NaN NaN NaN
AHY.AX NaN NaN NaN NaN NaN NaN
ALL.AX NaN 0.860923 NaN NaN NaN NaN
预期结果:
AGL.AX ALL.AX
AMC.AX ABC.AX
答案 0 :(得分:3)
使用stack
重塑/旋转数据框,并将索引转换为多索引:
df_corr[(df_corr>0.7)&(df_corr<1)].stack()
Out[79]:
A2M.AX
ABC.AX AMC.AX 0.860923
ALL.AX AGL.AX 0.860923
dtype: float64
df_corr[(df_corr>0.7)&(df_corr<1)].stack().index.values
Out[80]: array([('ABC.AX', 'AMC.AX'), ('ALL.AX', 'AGL.AX')], dtype=object)
答案 1 :(得分:1)
这是使用NumPy索引的一种方法,它避免了必须对数据帧进行子集化的情况。
import numpy as np
condition = df.gt(0.7) & df.lt(1)
x, y = map(list, zip(*np.where(condition.values)))
res = list(zip(df.index[x], df.columns[y]))
[('ABC.AX', 'AMC.AX'), ('ALL.AX', 'AGL.AX')]