我有一个基本数据框如下 -
df1_data = {'id' :{0:'101',1:'102',2:'103',3:'104',4:'105'},
'sym1' :{0:'abc',1:'pqr',2:'xyz',3:'mno',4:'lmn'}}
df1 = pd.DataFrame(df1_data)
print df1
id sym1
0 101 abc
1 102 pqr
2 103 xyz
3 104 mno
4 105 lmn
从这个数据框架中,我想在其他四个数据帧列中检查列 sym1 是否可用?
四种不同的数据框架:
df2_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'a',1:'b',2:'c',3:'d'}}
df2 = pd.DataFrame(df2_data)
print df2
df3_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'h',1:'i',2:'k',3:'l'}}
df3 = pd.DataFrame(df2_data)
print df3
df4_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'p',1:'q',2:'r',3:'s'}}
df4 = pd.DataFrame(df4_data)
print df4
df5_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'w',1:'x',2:'y',3:'z'}}
df5 = pd.DataFrame(df5_data)
print df5
在数据帧df2中可用的列sym2中,df3,df4,df5可能包含相同的符号,也可能不包含相同的符号。我的意图是检查df2,df3,df4,df5数据帧sym2列值中是否有sym1列值?
预期输出
id sym1
0 102 pqr
1 105 lmn
结论 -
符号 pqr 和 lmn 在数据帧df2,df3,df4和df5的sym2列中不可用。
答案 0 :(得分:5)
isin
检查df1.sym1
的每个元素是否在其他可迭代内pd.concat
将所有其他数据框串在一起df1[~df1.sym1.isin(pd.concat([df2, df3, df4, df5]).sym2)]
id sym1
1 102 pqr
4 105 lmn
numpy
变体,快3倍df1[~df1.sym1.isin(np.concatenate([d.sym2.values for d in [df2, df3, df4, df5]]))]
答案 1 :(得分:4)
与merge
和参数indicator
进行比较的另一种解决方案:
dfs = [df2,df3,df4,df5]
df = pd.concat(dfs, keys=['df2','df3','df4','df5'])
print (df)
name sym2
df2 0 a abc
1 b xxx
2 c xyz
3 d mno
df3 0 a abc
1 b xxx
2 c xyz
3 d mno
df4 0 p abc
1 q xxx
2 r xyz
3 s mno
df5 0 w abc
1 x xxx
2 y xyz
3 z mno
merged = pd.merge(df.rename_axis(['dfs','idx']).reset_index(),
df1,
left_on='sym2',
right_on='sym1',
how='outer',
indicator=True)
print (merged)
dfs idx name sym2 id sym1 _merge
0 df2 0.0 a abc 101 abc both
1 df3 0.0 a abc 101 abc both
2 df4 0.0 p abc 101 abc both
3 df5 0.0 w abc 101 abc both
4 df2 1.0 b xxx NaN NaN left_only
5 df3 1.0 b xxx NaN NaN left_only
6 df4 1.0 q xxx NaN NaN left_only
7 df5 1.0 x xxx NaN NaN left_only
8 df2 2.0 c xyz 103 xyz both
9 df3 2.0 c xyz 103 xyz both
10 df4 2.0 r xyz 103 xyz both
11 df5 2.0 y xyz 103 xyz both
12 df2 3.0 d mno 104 mno both
13 df3 3.0 d mno 104 mno both
14 df4 3.0 s mno 104 mno both
15 df5 3.0 z mno 104 mno both
16 NaN NaN NaN NaN 102 pqr right_only
17 NaN NaN NaN NaN 105 lmn right_only
print (merged.loc[merged['_merge']=='right_only', ['id','sym1']])
id sym1
16 102 pqr
17 105 lmn
print (merged.loc[merged['_merge']=='left_only', ['dfs', 'sym2']])
dfs sym2
4 df2 xxx
5 df3 xxx
6 df4 xxx
7 df5 xxx