我想比较两个多索引数据帧并添加另一列以显示值的差异(如果所有索引值在第一个数据帧和第二个数据帧之间都匹配)而无需使用循环
index_a = [1,2,2,3,3,3]
index_b = [0,0,1,0,1,2]
index_c = [1,2,2,4,4,4]
index = pd.MultiIndex.from_arrays([index_a,index_b], names=('a','b'))
index_1 = pd.MultiIndex.from_arrays([index_c,index_b], names=('a','b'))
df1 = pd.DataFrame(np.random.rand(6,), index=index, columns=['p'])
df2 = pd.DataFrame(np.random.rand(6,), index=index_1, columns=['q'])
df1
p
a b
1 0 .4655
2 0 .8600
1 .9010
3 0 .0652
1 .5686
2 .8965
df2
q
a b
1 0 .6591
2 0 .5684
1 .5689
4 0 .9898
1 .3656
2 .6989
结果矩阵(df1-df2)应该看起来像
p diff
a b
1 0 .4655 -0.1936
2 0 .8600 .2916
1 .9010 .3321
3 0 .0652 No Match
1 .5686 No Match
2 .8965 No Match
答案 0 :(得分:3)
将reindex_like
或reindex
用于索引的交集:
df1['new'] = (df1['p'] - df2['q'].reindex_like(df1)).fillna('No Match')
#alternative
#df1['new'] = (df1['p'] - df2['q'].reindex(df1.index)).fillna('No Match')
print (df1)
p new
a b
1 0 0.955587 0.924466
2 0 0.312497 -0.310224
1 0.306256 0.231646
3 0 0.575613 No Match
1 0.674605 No Match
2 0.462807 No Match
使用Index.intersection
和DataFrame.loc
的另一个想法:
df1['new'] = (df1['p'] - df2.loc[df2.index.intersection(df1.index), 'q']).fillna('No Match')
或与merge
一起加入左联接:
df = pd.merge(df1, df2, how='left', left_index=True, right_index=True)
df['new'] = (df['p'] - df['q']).fillna('No Match')
print (df)
p q new
a b
1 0 0.789693 0.665148 0.124544
2 0 0.082677 0.814190 -0.731513
1 0.762339 0.235435 0.526905
3 0 0.727695 NaN No Match
1 0.903596 NaN No Match
2 0.315999 NaN No Match
答案 1 :(得分:0)
使用以下命令获取匹配索引的差异。不匹配指数将为NaN
diff = df1['p'] - df2['q']
#Output
a b
1 0 -0.666542
2 0 -0.389033
1 0.064986
3 0 NaN
1 NaN
2 NaN
4 0 NaN
1 NaN
2 NaN
dtype: float64