Python Pandas:从连接数据帧的所有键中获取相同的列(使用Multiindex)

时间:2013-08-01 19:46:19

标签: python pandas

鉴于通过将其他数据帧与完全相同的列/行连接而创建的数据帧,如何获得所有键的所有列?

这是一个具体的例子:

In [9]: df = pd.DataFrame(np.random.randn(nrow, ncol), columns=list(string.uppercase[:ncol]))

In [10]: df
Out[10]: 
          A         B         C         D         E
0 -2.445083  0.020886 -0.518002 -1.087649 -2.457616
1 -0.834116 -0.000645 -0.052698  1.017388  0.977475
2 -0.043448  0.348393 -0.846228 -1.144556  1.472701
3  0.359526 -1.723547 -1.659162  0.173996  0.315652
4  1.100312 -0.681820 -1.065581  0.153885  0.398029
5 -2.992605  0.322006  0.097947 -0.514609 -0.871674
6  1.981342  0.147712  0.497502  0.547683  1.070719
7  0.281246 -0.198311  0.564416 -0.762356  0.763791
8 -0.913407  0.927109  0.348485  3.364223  2.602642
9 -0.644116  2.095727  1.125958  0.296914 -0.420522

In [11]: pieces = []

In [12]: for i in range(4):
   ....:     pieces.append(pd.DataFrame(np.random.randn(nrow,ncol), columns=list(string.uppercase[:ncol]))
   ....:     

In [13]: df_concat = pd.concat(pieces, keys=['W','X','Y','Z'], axis=1)

In [14]: df_concat

Out[14]: 
          W                                                 X            \
          A         B         C         D         E         A         B   
0 -0.505484 -0.457853 -0.990727 -0.780617  1.215694  0.450981 -1.633229   
1  0.116248  0.235593 -0.339177  0.358038  0.583175  1.699095 -0.238950   
2 -0.000709 -2.145297  1.041371 -0.046306  0.308357  1.098283  0.020833   
3  0.301729 -0.385389 -0.247188 -1.212048  1.344364  0.271609 -0.570161   
4 -0.965596  0.030255  0.677786 -0.272460  0.074819 -1.129305 -1.367137   
5  0.712317 -0.888795 -1.096789 -0.606129 -1.048819 -2.629423  1.298547   
6 -0.743539  0.040812 -0.802773  0.743799  0.430384 -0.902586  0.082162   
7 -0.587438 -1.298439 -1.130855 -1.860293  1.802137 -0.071374  2.002444   
8  0.060809 -0.279892  0.316728  0.413448 -0.564599 -0.127618  0.628813   
9  1.142441  1.224539  0.572980  0.037514 -0.513964 -1.026794  0.899758   

                                        Y                                \
          C         D         E         A         B         C         D   
0 -0.953875 -0.656037 -1.083118 -0.706460 -0.542555  0.028699  1.100427   
1 -0.812239 -0.758029  0.365095  0.132736  1.161346 -1.372225 -1.780733   
2 -1.347575  1.524654  0.031564  0.651127 -0.751353  0.770411  0.317422   
3 -1.269158  0.590106  0.007470 -1.068919 -0.748173 -0.495151  0.304920   
4  0.488790 -0.067784 -1.154394 -1.795902  0.315138 -0.243877  0.698870   
5  0.296125 -0.010721  0.984436 -1.692544  0.703791  0.898088  2.379869   
6  1.580341 -0.984228 -1.141533 -0.950717 -1.158840  0.149764  1.136630   
7  1.216956 -0.429757  0.376067  0.417440  0.331015 -0.837385 -0.984118   
8 -1.508074 -0.483468  0.297295  0.253952 -0.356498 -0.193768 -0.954337   
9 -0.951482 -0.020037 -1.888375 -1.052739 -0.996700 -0.758079 -0.239132   

                    Z                                          
          E         A         B         C         D         E  
0 -0.736567  1.451512 -0.877736 -0.826044  0.850919  0.005778  
1 -0.327570 -1.706155  1.359768  0.808397  1.697910  0.109116  
2  0.932116  0.361915 -0.460502 -0.344834 -1.792748  0.722837  
3  0.567515 -0.440755  0.850031 -0.091985 -0.296515 -0.078628  
4  0.210144  0.617150  1.017416  0.552831 -1.757228  0.983008  
5 -0.134114 -1.137423  0.256443 -1.015701  0.972131 -1.686675  
6  0.376023 -0.195116  2.127337 -0.687416  0.425428  2.378165  
7 -0.082692  1.686996 -1.857700  0.638241  0.551779 -0.486632  
8  2.148983  0.188987 -0.387614  0.833069  1.240079 -0.031077  
9 -1.278626 -1.219897 -0.173212 -0.119734 -0.244129  1.940811 

如何获取所有键的“A”列?我尝试做同样的事情,但使用Panels,但再次,它需要第一个键。如果我只想要所有的键怎么办?

In [18]: p = pd.Panel.from_dict(dict(zip(['W','X','Y','Z'], [pd.DataFrame(np.random.randn(nrow, ncol), columns=list(string.uppercase[:ncol])) for i in range(4)])))

In [19]: p
Out[19]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 10 (major_axis) x 5 (minor_axis)
Items axis: W to Z
Major_axis axis: 0 to 9
Minor_axis axis: A to E

我想要的最终输出是所有行的10x4数据帧乘以所有样本的'A'列。到目前为止,我一直在做的是从每个数据帧中手动提取一列,然后将它们连接在一起以形成10x4数据帧,例如

In [35]: a_pieces = [df_concat[x].ix[:,'A'] for x in ['W','X','Y','Z']]

In [36]: a_concat = pd.concat(a_pieces, keys=['W','X','Y','Z'], axis=1)

In [37]: a_concat
Out[37]: 
          W         X         Y         Z
0 -0.505484  0.450981 -0.706460  1.451512
1  0.116248  1.699095  0.132736 -1.706155
2 -0.000709  1.098283  0.651127  0.361915
3  0.301729  0.271609 -1.068919 -0.440755
4 -0.965596 -1.129305 -1.795902  0.617150
5  0.712317 -2.629423 -1.692544 -1.137423
6 -0.743539 -0.902586 -0.950717 -0.195116
7 -0.587438 -0.071374  0.417440  1.686996
8  0.060809 -0.127618  0.253952  0.188987
9  1.142441 -1.026794 -1.052739 -1.219897

2 个答案:

答案 0 :(得分:2)

应该能够用xs取出切片

df_concat.xs('A', level=1, axis=1)

答案 1 :(得分:1)

这是你在寻找什么?

In [20]: df_concat.swaplevel(1,0,axis=1)['A']
Out[20]: 
          W         X         Y         Z
0 -1.040162  0.220310  0.493406  0.224235
1  0.093167  1.554220  1.626530  0.068452
2  0.700489  0.563523  0.882834  0.263289
3  0.148377  0.012024 -0.871754  0.428075
4 -0.812572 -0.194886  1.234637  1.174096
5 -0.226345 -0.211326  0.688867 -0.992412
6 -1.348947 -1.319374 -0.693617  1.069359
7 -0.336275  1.191541  0.681850  0.259941
8 -1.029588 -1.260796  0.184852 -0.136066
9  0.115574 -0.075612  0.777306 -0.874591