熊猫比较多索引数据框而不循环

时间:2018-09-27 10:45:57

标签: python python-3.x pandas

我想比较两个多索引数据帧并添加另一列以显示值的差异(如果所有索引值在第一个数据帧和第二个数据帧之间都匹配)而无需使用循环

index_a = [1,2,2,3,3,3]
index_b = [0,0,1,0,1,2]
index_c = [1,2,2,4,4,4]
index = pd.MultiIndex.from_arrays([index_a,index_b], names=('a','b'))
index_1 = pd.MultiIndex.from_arrays([index_c,index_b], names=('a','b'))
df1 = pd.DataFrame(np.random.rand(6,), index=index, columns=['p'])
df2 = pd.DataFrame(np.random.rand(6,), index=index_1, columns=['q'])       

df1

    p
a b 
1 0 .4655

2 0 .8600
  1 .9010

3 0 .0652
  1 .5686
  2 .8965

df2

    q
a b
1 0 .6591

2 0 .5684
  1 .5689

4 0 .9898
  1 .3656
  2 .6989 

结果矩阵(df1-df2)应该看起来像

        p  diff
a b 
1 0 .4655  -0.1936 

2 0 .8600   .2916
  1 .9010   .3321   

3 0 .0652    No Match
  1 .5686    No Match
  2 .8965    No Match

2 个答案:

答案 0 :(得分:3)

reindex_likereindex用于索引的交集:

df1['new'] = (df1['p'] - df2['q'].reindex_like(df1)).fillna('No Match')
#alternative
#df1['new'] = (df1['p'] - df2['q'].reindex(df1.index)).fillna('No Match')
print (df1)
            p       new
a b                    
1 0  0.955587  0.924466
2 0  0.312497 -0.310224
  1  0.306256  0.231646
3 0  0.575613  No Match
  1  0.674605  No Match
  2  0.462807  No Match

使用Index.intersectionDataFrame.loc的另一个想法:

df1['new'] = (df1['p'] - df2.loc[df2.index.intersection(df1.index), 'q']).fillna('No Match')

或与merge一起加入左联接:

df = pd.merge(df1, df2, how='left', left_index=True, right_index=True)
df['new'] = (df['p'] - df['q']).fillna('No Match')
print (df)
            p         q       new
a b                              
1 0  0.789693  0.665148  0.124544
2 0  0.082677  0.814190 -0.731513
  1  0.762339  0.235435  0.526905
3 0  0.727695       NaN  No Match
  1  0.903596       NaN  No Match
  2  0.315999       NaN  No Match

答案 1 :(得分:0)

使用以下命令获取匹配索引的差异。不匹配指数将为NaN

diff = df1['p'] - df2['q']

#Output
a  b
1  0   -0.666542
2  0   -0.389033
   1    0.064986
3  0         NaN
   1         NaN
   2         NaN
4  0         NaN
   1         NaN
   2         NaN
dtype: float64