DataFrame与有序索引和不同列

时间:2017-10-13 03:34:08

标签: python-3.x pandas join merge

我有两个pandas数据帧,我想合并。数据帧具有不同的列和重叠索引。我想合并它们,保持索引的顺序完整。

数据框(d1)

                              Dec 16 Dec 15   
Balance Sheet                     
NON-CURRENT LIABILITIES          NaN    NaN   <-- 'all Nan' row
Other Long Term Liabilities     8.37   9.30
Long Term Provisions           13.53  12.74   <-- Not present in d2
Total Non-Current Liabilities  21.90  22.04
CURRENT LIABILITIES              NaN    NaN   <-- 'all Nan' row
Trade Payables                 32.49  24.26

数据框(d2)

                               Dec 11 Dec 10
Balance Sheet                     
NON-CURRENT LIABILITIES           NaN    NaN
Deferred Tax Liabilities [Net]   0.00   7.40   <-- Not present in d1
Other Long Term Liabilities     14.13   0.00
Total Non-Current Liabilities   14.13   7.40
CURRENT LIABILITIES               NaN    NaN
Trade Payables                  77.35  60.40

我尝试了以下方法来合并这些数据框,但它们都没有工作。

d1.merge(d2, how='left', left_index=True,right_index=True)

d1.merge(d2, how='outer', left_index=True,right_index=True)

pd.merge_ordered(d1,d2,left_on=['Dec 16'],right_on=['Dec 11'])

pd.concat([d1.merge(d2, how='left', left_index=True,right_index=True),d1.merge(d2, how='right', left_index=True,right_index=True)]).drop_duplicates(subset='Dec 16',keep='last')

我希望结果数据框看起来像这样

                              Dec 16 Dec 15 Dec 11 Dec 10
Balance Sheet                    
NON-CURRENT LIABILITIES          NaN    NaN  NaN    NaN
Deferred Tax Liabilities [Net]   NaN    NaN  0.00   7.40    <-- from d2
Other Long Term Liabilities     8.37   9.30  14.13  0.00    <-- d1+d2 merged
Long Term Provisions           13.53  12.74  NaN    NaN     <-- from d1
Total Non-Current Liabilities  21.90  22.04  14.13  7.40    <-- d1+d2 merged
CURRENT LIABILITIES              NaN    NaN  NaN    NaN
Trade Payables                 32.49  24.26  77.35  60.40

请注意,整体顺序很重要(例如,所有NaN行需要按相同顺序排列),而不是“所有NaN”行之间合并索引的顺序。此外,d1列应位于d2列之前。

1 个答案:

答案 0 :(得分:0)

how=outermergereindex一起使用自定义订单

In [1424]: order_index =  ['NON-CURRENT LIABILITIES',  'Deferred Tax Liabilities [Net]',  
                           'Other Long Term Liabilities',  'Long Term Provisions',  
                           'Total Non-Current Liabilities',  'CURRENT LIABILITIES',
                           'Trade Payables']

In [1425]: df1.merge(df2,how='outer',left_index=True,right_index=True).reindex(order_index)
Out[1425]:
                                Dec 16  Dec 15  Dec 11  Dec 10
Balance Sheet
NON-CURRENT LIABILITIES            NaN     NaN     NaN     NaN
Deferred Tax Liabilities [Net]     NaN     NaN    0.00     7.4
Other Long Term Liabilities       8.37    9.30   14.13     0.0
Long Term Provisions             13.53   12.74     NaN     NaN
Total Non-Current Liabilities    21.90   22.04   14.13     7.4
CURRENT LIABILITIES                NaN     NaN     NaN     NaN
Trade Payables                   32.49   24.26   77.35    60.4

此外,join有效

In [1426]: df1.join(df2, how='outer').reindex(order_index)
Out[1426]:
                                Dec 16  Dec 15  Dec 11  Dec 10
Balance Sheet
NON-CURRENT LIABILITIES            NaN     NaN     NaN     NaN
Deferred Tax Liabilities [Net]     NaN     NaN    0.00     7.4
Other Long Term Liabilities       8.37    9.30   14.13     0.0
Long Term Provisions             13.53   12.74     NaN     NaN
Total Non-Current Liabilities    21.90   22.04   14.13     7.4
CURRENT LIABILITIES                NaN     NaN     NaN     NaN
Trade Payables                   32.49   24.26   77.35    60.4

详细

In [1417]: df1
Out[1417]:
                               Dec 16  Dec 15
Balance Sheet
NON-CURRENT LIABILITIES           NaN     NaN
Other Long Term Liabilities      8.37    9.30
Long Term Provisions            13.53   12.74
Total Non-Current Liabilities   21.90   22.04
CURRENT LIABILITIES               NaN     NaN
Trade Payables                  32.49   24.26

In [1418]: df2
Out[1418]:
                                Dec 11  Dec 10
Balance Sheet
NON-CURRENT LIABILITIES            NaN     NaN
Deferred Tax Liabilities [Net]    0.00     7.4
Other Long Term Liabilities      14.13     0.0
Total Non-Current Liabilities    14.13     7.4
CURRENT LIABILITIES                NaN     NaN
Trade Payables                   77.35    60.4