df1
|Invoice # |Date |Amount
|12 |12/15/2015 |$10
|13 |12/16/2015 |$11
|14 |12/17/2015 |$12
df2
|Invoice # |Date |Amount
|12 |1/16/2016 |$10
|14 |1/17/2016 |$12
Merged = df1.merge(df2,how = left,on = Invoice#)
|Invoice # |Date |Amount
|12 |12/15/2015 |$10
|NaN |NaN |NaN
|14 |1/17/2016 |$12
我想做的是使用Invoice 13在合并中返回NaN值并将其放入列表中。有什么想法吗?
答案 0 :(得分:1)
Your merged result is not showing what actually happens with a left merge?
Here's what I get when I try to reproduce what I think you're trying to do (I'm using pandas version 0.19.0):
merged = df1.merge(df2, how='left', on='Invoice #')
Then you can mask by the missing values and get a dataframe containing those rows:
merged[merged['Amount_y'].isnull()]
Or just create a column with the boolean flag:
merged['missing_from_df2'] = merged['Amount_y'].isnull()
To select things from the masked dataframe, treat it like any other dataframe, and index into one or more columns by listing them (note that if you want more than one, you have to do double brackets).
You can save it to a new variable to make the syntax simpler if you want to do other things with it.
答案 1 :(得分:0)
method 1
pd.concat
+ drop_duplicates
pd.concat([df1, df2]).drop_duplicates(subset=['Invoice #'])
method 2
combine_first
df1.set_index('Invoice #').combine_first(df2.set_index('Invoice #')).reset_index()
method 3
merge
df1.merge(df2, on='Invoice #', suffixes=['', '_'], how='left')[df1.columns]
method 4
join
df1.join(df2.set_index('Invoice #'), on='Invoice #', rsuffix='_')[df1.columns]
all produce
timing
pd.concat
+ drop_duplicates
is the fastest