将垂直矩阵转换为相关矩阵。蟒蛇

时间:2016-03-06 15:54:11

标签: python numpy pandas matrix correlation

我使用pd.DataFrame.corr()方法从我的DataFrame创建了一个相关矩阵,做了一些我切断某些值以获得类似于下面DF_interactions的表的东西。我现在想把它带回到相关矩阵样式,例如下面的DF_corr

使用pandasnumpysklearnscipy将交互表转换为相关式矩阵的最有效方法是什么?

我已经包含了填充此数据框的天真方法......

#Create table of interactions 
DF_interactions=pd.DataFrame([["A","B",0.1],
                              ["A","C",0.4],
                              ["B","C",0.3],
                              ["A","D",0.4]],columns=["var1","var2","corr"])
#   var1 var2  corr
# 0    A    B   0.1
# 1    A    C   0.4
# 2    B    C   0.3
# 3    A    D   0.4
n,m = DF_interactions.shape
#4 3
#Show which labels would be in correlation matrix for rows/columns
nodes = set(DF_interactions["var1"]) | set(DF_interactions["var2"])
#set(['A', 'C', 'B', 'D'])

#Create empty DataFrame to fill
DF_corr = pd.DataFrame(np.zeros((len(nodes),len(nodes))), columns = sorted(nodes),index=sorted(nodes))
#    A  B  C  D
# A  0  0  0  0
# B  0  0  0  0
# C  0  0  0  0
# D  0  0  0  0

#Naive way to fill it
for i in range(n):
    var1 = DF_interactions.iloc[i,0]
    var2 = DF_interactions.iloc[i,1]
    corr = DF_interactions.iloc[i,2]
    DF_corr.loc[var1,var2] = corr
    DF_corr.loc[var2,var1] = corr
#      A    B    C    D
# A  0.0  0.1  0.4  0.4
# B  0.1  0.0  0.3  0.0
# C  0.4  0.3  0.0  0.0
# D  0.4  0.0  0.0  0.0

1 个答案:

答案 0 :(得分:1)

假设您的互动表只包含一半的相关性(如果不确定则添加.drop_duplicates()):

corr = pd.concat([DF_interactions, DF_interactions.rename(columns={'var1': 'var2', 'var2': 'var1'})])

然后使用.pivot()

corr = corr.pivot(index='var1', columns='var2', values='corr')

var2    A    B    C    D
var1                    
A     NaN  0.1  0.4  0.4
B     0.1  NaN  0.3  NaN
C     0.4  0.3  NaN  NaN
D     0.4  NaN  NaN  NaN

如果您希望0值缺少相互作用,请使用.fillna(0)