我有这张桌子:
import pandas as pd
a=pd.DataFrame([[1,1,1,1],[2,2,2,2],[3,2,2,2],[4,2,4,3],[5,1,2,4]],
columns=(['a','b','c','d'])) `
我想创建一个具有相关性的表,但只能创建一个相关性超过0.4
的表。
答案 0 :(得分:0)
print (a.corr())
a b c d
a 1.000000 0.000000 0.577350 0.970725
b 0.000000 1.000000 0.583333 -0.080064
c 0.577350 0.583333 1.000000 0.520416
d 0.970725 -0.080064 0.520416 1.000000
b = a.corr()
#replace <= 0.4 to NaN
print (b.where(b > 0.4))
a b c d
a 1.000000 NaN 0.577350 0.970725
b NaN 1.000000 0.583333 NaN
c 0.577350 0.583333 1.000000 0.520416
d 0.970725 NaN 0.520416 1.000000
#replace <= 0.4 to 0
print (b.where(b > 0.4, 0))
a b c d
a 1.000000 0.000000 0.577350 0.970725
b 0.000000 1.000000 0.583333 0.000000
c 0.577350 0.583333 1.000000 0.520416
d 0.970725 0.000000 0.520416 1.000000