Python Pandas:如何基于“OR”条件进行合并?

时间:2017-08-24 19:54:00

标签: python pandas dataframe merge

假设我有两个数据帧,两者的列名都是:

table 1 columns:
[ShipNumber, TrackNumber, ShipDate, Quantity, Weight]
table 2 columns:
[ShipNumber, TrackNumber, AmountReceived]

我想基于ShipNumber和TrackNumber合并两个表。 但是,如果我只是按照以下方式使用merge(伪代码,而不是真正的代码):

tab1.merge(tab2, "left", on=['ShipNumber','TrackNumber'])

然后,这意味着两个表中的ShipNumber和TrackNumber列中的值必须匹配。

然而,在我的情况下,有时ShipNumber列值会匹配,有时TrackNumber列值会匹配; 只要两个值中的一个匹配行,我希望合并发生。

换句话说,如果选项卡1中的第1行ShipNumber与选项卡2中的第3行ShipNumber匹配,但两个表中的TrackNumber不匹配,我仍然希望匹配这两个表中的两行。 / p>

所以基本上这是一个/或匹配条件(pesudo代码):

if tab1.ShipNumber == tab2.ShipNumber OR tab1.TrackNumber == tab2.TrackNumber:
    then merge

我希望我的问题有道理...... 任何帮助真的很感激!

根据建议,我查看了这篇文章: Python pandas merge with OR logic 但是我认为这并不是完全相同的问题,因为该帖子中的OP有一个映射文件,因此他们可以简单地进行2次合并来解决这个问题。但是我没有映射文件,相反,我有两个具有相同键列的df(ShipNumber,TrackNumber)

2 个答案:

答案 0 :(得分:3)

使用merge()concat()。然后删除AB匹配的任何重复案例(感谢@Scott Boston的最后一步)。

df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})

df1         df2
   A  B        A  B
0  1  7     0  1  4
1  2  8     1  5  1
2  3  9     2  6  8
3  4  5     3  4  5

有了这些数据框,我们应该看到:

  • df1.loc[0]A
  • 上的df2.loc[0]匹配
  • df1.loc[1]B
  • 上的df2.loc[2]匹配
  • df1.loc[3]A
  • 上的Bdf2.loc[3]相匹配

我们将使用后缀来跟踪匹配的位置:

suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']

pd.concat([df1.merge(df2, on='A', suffixes=suff_A), 
           df1.merge(df2, on='B', suffixes=suff_B)])

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
1  4.0             NaN             NaN  NaN             5.0             5.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

请注意,第二行和第四行是重复匹配(对于两个数据帧A = 4B = 5)。我们需要删除其中一组。

dupes = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~dupes]

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

答案 1 :(得分:0)

我建议采用这种替代方式进行合并。对我来说,这似乎更容易。

5.4.0 - [Firebase/Analytics][I-ACS023007] Firebase Analytics v.50001000 started
5.4.0 - [Firebase/Analytics][I-ACS023008] To enable debug logging set 
the following application argument: -FIRAnalyticsDebugEnabled (see 
link..)
*** First throw call stack:
(
0   CoreFoundation                      0x000000010e3d61e6 __exceptionPreprocess + 294
1   libobjc.A.dylib                     0x000000010da6b031 objc_exception_throw + 48
2   CoreFoundation                      0x000000010e44b975 +[NSException raise:format:] + 197
3   Runner                              0x0000000107b3143b -[LocationPlugin init] + 731
4   Runner                              0x0000000107b310ca +[LocationPlugin registerWithRegistrar:] + 266
5   Runner                              0x00000001079b515d +[GeneratedPluginRegistrant registerWithRegistry:] + 733
6   Runner                              0x00000001079b4d49 -[AppDelegate application:didFinishLaunchingWithOptions:] + 121
7   UIKit                   <…>

如果需要,您也可以在 $feb = Order_data::whereMonth('sales_date', 2)->get(['gross_total_amount'])->toArray(); $febdata = array_column($feb, 'gross_total_amount'); $febtotal = array_sum($febdata); $mar = Order_data::whereMonth('sales_date', 3)->get(['gross_total_amount'])->toArray(); $mardata = array_column($mar, 'gross_total_amount'); $martotal = array_sum($mardata); $apr = Order_data::whereMonth('sales_date', 4)->get(['gross_total_amount'])->toArray(); $aprdata = array_column($apr, 'gross_total_amount'); $aprtotal = array_sum($aprdata); $may = Order_data::whereMonth('sales_date', 5)->get(['gross_total_amount'])->toArray(); $maydata = array_column($may, 'gross_total_amount'); $maytotal = array_sum($maydata); $jun = Order_data::whereMonth('sales_date', 6)->get(['gross_total_amount'])->toArray(); $jundata = array_column($jun, 'gross_total_amount'); $juntotal = array_sum($jundata); $july = Order_data::whereyear('sales_date',date('Y'))->whereMonth('sales_date', 7)->get(['gross_total_amount'])->toArray(); $julydata = array_column($july, 'gross_total_amount'); $julytotal = array_sum($julydata); $aug = Order_data::whereMonth('sales_date', 8)->get(['gross_total_amount'])->toArray(); $augdata = array_column($aug, 'gross_total_amount'); $augtotal = array_sum($augdata); $sep = Order_data::whereMonth('sales_date', 9)->get(['gross_total_amount'])->toArray(); $sepdata = array_column($sep, 'gross_total_amount'); $septotal = array_sum($sepdata); $oct = Order_data::whereMonth('sales_date', 10)->get(['gross_total_amount'])->toArray(); $octdata = array_column($oct, 'gross_total_amount'); $octtotal = array_sum($octdata); $nov = Order_data::whereMonth('sales_date', 11)->get(['gross_total_amount'])->toArray(); $novdata = array_column($nov, 'gross_total_amount'); $novtotal = array_sum($novdata); $dec = Order_data::whereMonth('sales_date', 12)->get(['gross_total_amount'])->toArray(); $decdata = array_column($dec, 'gross_total_amount'); $dectotal = array_sum($decdata); $data = collect([$jantotal, $febtotal, $martotal, $aprtotal, $maytotal, $juntotal, $julytotal, $augtotal, $septotal, $octtotal, $novtotal, $dectotal]);' 中添加同一列,然后根据需要在table1["id_to_be_merged"] = table1.apply( lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) else row["TrackNumber"], axis=1) table2中使用。