相关列表DataFrame Pandas

时间:2017-03-05 05:42:11

标签: python pandas

我有这张桌子:

import pandas as pd
a=pd.DataFrame([[1,1,1,1],[2,2,2,2],[3,2,2,2],[4,2,4,3],[5,1,2,4]],
               columns=(['a','b','c','d'])) `

我想创建一个具有相关性的表,但只能创建一个相关性超过0.4的表。

1 个答案:

答案 0 :(得分:0)

您需要corrwhere

的IIUC
print (a.corr())
          a         b         c         d
a  1.000000  0.000000  0.577350  0.970725
b  0.000000  1.000000  0.583333 -0.080064
c  0.577350  0.583333  1.000000  0.520416
d  0.970725 -0.080064  0.520416  1.000000

b = a.corr()
#replace <= 0.4 to NaN
print (b.where(b > 0.4))
          a         b         c         d
a  1.000000       NaN  0.577350  0.970725
b       NaN  1.000000  0.583333       NaN
c  0.577350  0.583333  1.000000  0.520416
d  0.970725       NaN  0.520416  1.000000

#replace <= 0.4 to 0    
print (b.where(b > 0.4, 0))
          a         b         c         d
a  1.000000  0.000000  0.577350  0.970725
b  0.000000  1.000000  0.583333  0.000000
c  0.577350  0.583333  1.000000  0.520416
d  0.970725  0.000000  0.520416  1.000000