假设我有两个数据帧,两者的列名都是:
table 1 columns:
[ShipNumber, TrackNumber, ShipDate, Quantity, Weight]
table 2 columns:
[ShipNumber, TrackNumber, AmountReceived]
我想基于ShipNumber和TrackNumber合并两个表。 但是,如果我只是按照以下方式使用merge(伪代码,而不是真正的代码):
tab1.merge(tab2, "left", on=['ShipNumber','TrackNumber'])
然后,这意味着两个表中的ShipNumber和TrackNumber列中的值必须匹配。
然而,在我的情况下,有时ShipNumber列值会匹配,有时TrackNumber列值会匹配; 只要两个值中的一个匹配行,我希望合并发生。
换句话说,如果选项卡1中的第1行ShipNumber与选项卡2中的第3行ShipNumber匹配,但两个表中的TrackNumber不匹配,我仍然希望匹配这两个表中的两行。 / p>
所以基本上这是一个/或匹配条件(pesudo代码):
if tab1.ShipNumber == tab2.ShipNumber OR tab1.TrackNumber == tab2.TrackNumber:
then merge
我希望我的问题有道理...... 任何帮助真的很感激!
根据建议,我查看了这篇文章: Python pandas merge with OR logic 但是我认为这并不是完全相同的问题,因为该帖子中的OP有一个映射文件,因此他们可以简单地进行2次合并来解决这个问题。但是我没有映射文件,相反,我有两个具有相同键列的df(ShipNumber,TrackNumber)
答案 0 :(得分:3)
使用merge()
和concat()
。然后删除A
和B
匹配的任何重复案例(感谢@Scott Boston的最后一步)。
df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})
df1 df2
A B A B
0 1 7 0 1 4
1 2 8 1 5 1
2 3 9 2 6 8
3 4 5 3 4 5
有了这些数据框,我们应该看到:
df1.loc[0]
与A
df2.loc[0]
匹配
df1.loc[1]
与B
df2.loc[2]
匹配
df1.loc[3]
与A
B
和df2.loc[3]
相匹配
我们将使用后缀来跟踪匹配的位置:
suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']
pd.concat([df1.merge(df2, on='A', suffixes=suff_A),
df1.merge(df2, on='B', suffixes=suff_B)])
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
0 1.0 NaN NaN NaN 9.0 4.0
1 4.0 NaN NaN NaN 5.0 5.0
0 NaN 2.0 6.0 8.0 NaN NaN
1 NaN 4.0 4.0 5.0 NaN NaN
请注意,第二行和第四行是重复匹配(对于两个数据帧A = 4
和B = 5
)。我们需要删除其中一组。
dupes = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~dupes]
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
0 1.0 NaN NaN NaN 9.0 4.0
0 NaN 2.0 6.0 8.0 NaN NaN
1 NaN 4.0 4.0 5.0 NaN NaN
答案 1 :(得分:0)
我建议采用这种替代方式进行合并。对我来说,这似乎更容易。
5.4.0 - [Firebase/Analytics][I-ACS023007] Firebase Analytics v.50001000 started
5.4.0 - [Firebase/Analytics][I-ACS023008] To enable debug logging set
the following application argument: -FIRAnalyticsDebugEnabled (see
link..)
*** First throw call stack:
(
0 CoreFoundation 0x000000010e3d61e6 __exceptionPreprocess + 294
1 libobjc.A.dylib 0x000000010da6b031 objc_exception_throw + 48
2 CoreFoundation 0x000000010e44b975 +[NSException raise:format:] + 197
3 Runner 0x0000000107b3143b -[LocationPlugin init] + 731
4 Runner 0x0000000107b310ca +[LocationPlugin registerWithRegistrar:] + 266
5 Runner 0x00000001079b515d +[GeneratedPluginRegistrant registerWithRegistry:] + 733
6 Runner 0x00000001079b4d49 -[AppDelegate application:didFinishLaunchingWithOptions:] + 121
7 UIKit <…>
如果需要,您也可以在 $feb = Order_data::whereMonth('sales_date', 2)->get(['gross_total_amount'])->toArray();
$febdata = array_column($feb, 'gross_total_amount');
$febtotal = array_sum($febdata);
$mar = Order_data::whereMonth('sales_date', 3)->get(['gross_total_amount'])->toArray();
$mardata = array_column($mar, 'gross_total_amount');
$martotal = array_sum($mardata);
$apr = Order_data::whereMonth('sales_date', 4)->get(['gross_total_amount'])->toArray();
$aprdata = array_column($apr, 'gross_total_amount');
$aprtotal = array_sum($aprdata);
$may = Order_data::whereMonth('sales_date', 5)->get(['gross_total_amount'])->toArray();
$maydata = array_column($may, 'gross_total_amount');
$maytotal = array_sum($maydata);
$jun = Order_data::whereMonth('sales_date', 6)->get(['gross_total_amount'])->toArray();
$jundata = array_column($jun, 'gross_total_amount');
$juntotal = array_sum($jundata);
$july = Order_data::whereyear('sales_date',date('Y'))->whereMonth('sales_date', 7)->get(['gross_total_amount'])->toArray();
$julydata = array_column($july, 'gross_total_amount');
$julytotal = array_sum($julydata);
$aug = Order_data::whereMonth('sales_date', 8)->get(['gross_total_amount'])->toArray();
$augdata = array_column($aug, 'gross_total_amount');
$augtotal = array_sum($augdata);
$sep = Order_data::whereMonth('sales_date', 9)->get(['gross_total_amount'])->toArray();
$sepdata = array_column($sep, 'gross_total_amount');
$septotal = array_sum($sepdata);
$oct = Order_data::whereMonth('sales_date', 10)->get(['gross_total_amount'])->toArray();
$octdata = array_column($oct, 'gross_total_amount');
$octtotal = array_sum($octdata);
$nov = Order_data::whereMonth('sales_date', 11)->get(['gross_total_amount'])->toArray();
$novdata = array_column($nov, 'gross_total_amount');
$novtotal = array_sum($novdata);
$dec = Order_data::whereMonth('sales_date', 12)->get(['gross_total_amount'])->toArray();
$decdata = array_column($dec, 'gross_total_amount');
$dectotal = array_sum($decdata);
$data = collect([$jantotal, $febtotal, $martotal, $aprtotal, $maytotal, $juntotal, $julytotal, $augtotal, $septotal, $octtotal, $novtotal, $dectotal]);'
中添加同一列,然后根据需要在table1["id_to_be_merged"] = table1.apply(
lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) else row["TrackNumber"], axis=1)
或table2
中使用。