我创建了一个像这样的3D Pandas数据框:
A= ['ECFP', 'ECFP', 'ECFP', 'FCFP', 'FCFP', 'FCFP', 'RDK5', 'RDK5', 'RDK5']
B = ['R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc']
C = array([[ 0.1 , 0.3 , 0.5 , nan, 0.6 , 0.4 ],
[ 0.4 , 0.3 , 0.3 , nan, 0.4 , 0.3 ],
[ 1.2 , 1.3 , 1.1 , nan, 1.5 , 1. ],
[ 0.4 , 0.3 , 0.4 , 0.8 , 0.1 , 0.2 ],
[ 0.2 , 0.3 , 0.3 , 0.3 , 0.5 , 0.6 ],
[ 1. , 1.2 , 1. , 0.9 , 1.2 , 1. ],
[ 0.4 , 0.7 , 0.5 , 0.4 , 0.6 , 0.6 ],
[ 0.6 , 0.5 , 0.3 , 0.3 , 0.3 , 0.5 ],
[ 1.2 , 1.5 , 1.3 , 0.97, 1.5 , 1. ]])
df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))
df = df.dropna(axis=0, how='any')
最终的Dataframe如下所示:
ECFP FCFP RDK5
R tau RMSEc R tau RMSEc R tau RMSEc
0 0.1 0.4 1.2 0.4 0.2 1.0 0.4 0.6 1.2
1 0.3 0.3 1.3 0.3 0.3 1.2 0.7 0.5 1.5
2 0.5 0.3 1.1 0.4 0.3 1.0 0.5 0.3 1.3
4 0.6 0.4 1.5 0.1 0.5 1.2 0.6 0.3 1.5
5 0.4 0.3 1.0 0.2 0.6 1.0 0.6 0.5 1.0
如何才能在所有类型的数据('ECFP','FCFP','RDK5')的'R'值之间获得相关矩阵?
答案 0 :(得分:4)
使用IndexSlice:
In [53]: df.loc[:, pd.IndexSlice[:, 'R']]
Out[53]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6
答案 1 :(得分:4)
使用slice
df.loc[:,(slice(None),'R')]
Out[375]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6
答案 2 :(得分:3)
这两个答案都有效,但首先我必须lexstort,否则我会收到此错误:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'
解决方案是:
df.sortlevel(axis=1, inplace=True)
print "Correlation matrix of Pearson's R values among all feature vector types:"
df.loc[:, pd.IndexSlice[:, 'R']].corr()