与此python pandas: how to find rows in one dataframe but not in another?相同 但有多列
这是设置:
SiteV2
现在,我想从import pandas as pd
df = pd.DataFrame(dict(
col1=[0,1,1,2],
col2=['a','b','c','b'],
extra_col=['this','is','just','something']
))
other = pd.DataFrame(dict(
col1=[1,2],
col2=['b','c']
))
中选择其他行中不存在的行。我希望通过df
和col1
在SQL中我会这样做:
col2
在熊猫我可以做这样的事情,但感觉非常难看。如果df具有id-column,则可以避免部分丑陋,但并不总是可用。
select * from df
where not exists (
select * from other o
where df.col1 = o.col1 and
df.col2 = o.col2
)
所以也许有一些更优雅的方式?
答案 0 :(得分:24)
Since 0.17.0
there is a new indicator
param you can pass to merge
which will tell you whether the rows are only present in left, right or both:
In [5]:
merged = df.merge(other, how='left', indicator=True)
merged
Out[5]:
col1 col2 extra_col _merge
0 0 a this left_only
1 1 b is both
2 1 c just left_only
3 2 b something left_only
In [6]:
merged[merged['_merge']=='left_only']
Out[6]:
col1 col2 extra_col _merge
0 0 a this left_only
2 1 c just left_only
3 2 b something left_only
So you can now filter the merged df by selecting only 'left_only'
rows
答案 1 :(得分:4)
有趣
cols = ['col1','col2']
#get copies where the indeces are the columns of interest
df2 = df.set_index(cols)
other2 = other.set_index(cols)
#Look for index overlap, ~
df[~df2.index.isin(other2.index)]
返回:
col1 col2 extra_col
0 0 a this
2 1 c just
3 2 b something
看起来更优雅......