我有两个pandas数据帧,我想合并。数据帧具有不同的列和重叠索引。我想合并它们,保持索引的顺序完整。
数据框(d1)
Dec 16 Dec 15
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN <-- 'all Nan' row
Other Long Term Liabilities 8.37 9.30
Long Term Provisions 13.53 12.74 <-- Not present in d2
Total Non-Current Liabilities 21.90 22.04
CURRENT LIABILITIES NaN NaN <-- 'all Nan' row
Trade Payables 32.49 24.26
数据框(d2)
Dec 11 Dec 10
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN
Deferred Tax Liabilities [Net] 0.00 7.40 <-- Not present in d1
Other Long Term Liabilities 14.13 0.00
Total Non-Current Liabilities 14.13 7.40
CURRENT LIABILITIES NaN NaN
Trade Payables 77.35 60.40
我尝试了以下方法来合并这些数据框,但它们都没有工作。
d1.merge(d2, how='left', left_index=True,right_index=True)
d1.merge(d2, how='outer', left_index=True,right_index=True)
pd.merge_ordered(d1,d2,left_on=['Dec 16'],right_on=['Dec 11'])
pd.concat([d1.merge(d2, how='left', left_index=True,right_index=True),d1.merge(d2, how='right', left_index=True,right_index=True)]).drop_duplicates(subset='Dec 16',keep='last')
我希望结果数据框看起来像这样
Dec 16 Dec 15 Dec 11 Dec 10
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN NaN NaN
Deferred Tax Liabilities [Net] NaN NaN 0.00 7.40 <-- from d2
Other Long Term Liabilities 8.37 9.30 14.13 0.00 <-- d1+d2 merged
Long Term Provisions 13.53 12.74 NaN NaN <-- from d1
Total Non-Current Liabilities 21.90 22.04 14.13 7.40 <-- d1+d2 merged
CURRENT LIABILITIES NaN NaN NaN NaN
Trade Payables 32.49 24.26 77.35 60.40
请注意,整体顺序很重要(例如,所有NaN行需要按相同顺序排列),而不是“所有NaN”行之间合并索引的顺序。此外,d1列应位于d2列之前。
答案 0 :(得分:0)
将how=outer
与merge
和reindex
一起使用自定义订单
In [1424]: order_index = ['NON-CURRENT LIABILITIES', 'Deferred Tax Liabilities [Net]',
'Other Long Term Liabilities', 'Long Term Provisions',
'Total Non-Current Liabilities', 'CURRENT LIABILITIES',
'Trade Payables']
In [1425]: df1.merge(df2,how='outer',left_index=True,right_index=True).reindex(order_index)
Out[1425]:
Dec 16 Dec 15 Dec 11 Dec 10
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN NaN NaN
Deferred Tax Liabilities [Net] NaN NaN 0.00 7.4
Other Long Term Liabilities 8.37 9.30 14.13 0.0
Long Term Provisions 13.53 12.74 NaN NaN
Total Non-Current Liabilities 21.90 22.04 14.13 7.4
CURRENT LIABILITIES NaN NaN NaN NaN
Trade Payables 32.49 24.26 77.35 60.4
此外,join
有效
In [1426]: df1.join(df2, how='outer').reindex(order_index)
Out[1426]:
Dec 16 Dec 15 Dec 11 Dec 10
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN NaN NaN
Deferred Tax Liabilities [Net] NaN NaN 0.00 7.4
Other Long Term Liabilities 8.37 9.30 14.13 0.0
Long Term Provisions 13.53 12.74 NaN NaN
Total Non-Current Liabilities 21.90 22.04 14.13 7.4
CURRENT LIABILITIES NaN NaN NaN NaN
Trade Payables 32.49 24.26 77.35 60.4
详细
In [1417]: df1
Out[1417]:
Dec 16 Dec 15
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN
Other Long Term Liabilities 8.37 9.30
Long Term Provisions 13.53 12.74
Total Non-Current Liabilities 21.90 22.04
CURRENT LIABILITIES NaN NaN
Trade Payables 32.49 24.26
In [1418]: df2
Out[1418]:
Dec 11 Dec 10
Balance Sheet
NON-CURRENT LIABILITIES NaN NaN
Deferred Tax Liabilities [Net] 0.00 7.4
Other Long Term Liabilities 14.13 0.0
Total Non-Current Liabilities 14.13 7.4
CURRENT LIABILITIES NaN NaN
Trade Payables 77.35 60.4