如何在时间序列上合并两个数据帧

时间:2018-06-23 23:55:33

标签: python pandas dataframe merge

我要创建的是一个看起来像这样的数据框:

    amount  months      category    
0   6460    2018-01-31  budgeted    
1   7905    2018-01-31  actual  
2   11509   2018-02-28  budgeted    
3   21502   2018-02-28  actual 
...
...

我拥有的示例代码和正在使用的基础数据是:

import pandas as pd
import string
import altair as alt

from random import randint

# 
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months                  = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts           = [ "actual", "budgeted", "difference" ]

summary = []

summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )

index   = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )

summaryDF = pd.DataFrame( summary, index = index, columns = columns )

#
# From this point, I am trying to transform the summaryDF into something 
# I can use in a different context...
#

budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )

idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual   = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()

budgeted.index = budgetMonths
actual.index = budgetMonths

budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF   = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })

print( budgetedDF )
print( actualDF )

df3 = pd.merge( budgetedDF, actualDF, on = 'months' )
df3 

df3看起来像:

    amount_x    months      category_x  amount_y    category_y
0   6460        2018-01-31  budgeted    7905        actual
1   11509       2018-02-28  budgeted    21502       actual
...
...

我想我已经快要得到想要的东西了……只需要最后的合并步骤即可。

2 个答案:

答案 0 :(得分:1)

使用pd.concat“合并”这些数据框。

df3 = (pd.concat([budgetedDF, actualDF])
         .sort_index()
         .reset_index(drop=True)
)

df3

    amount  category     months
0     2183  budgeted 2018-01-31
1     3612    actual 2018-01-31
2     8902  budgeted 2018-02-28
3     3357    actual 2018-02-28
4     9956  budgeted 2018-03-30
5     2828    actual 2018-03-30
6    14475  budgeted 2018-04-30
7     2990    actual 2018-04-30
8    25385  budgeted 2018-05-31
9     4446    actual 2018-05-31
10   19119    actual 2018-06-29
11   29119  budgeted 2018-06-29
12   27296    actual 2018-07-31
13   40869  budgeted 2018-07-31
14   38443    actual 2018-08-31
15   43400  budgeted 2018-08-31
16   47978    actual 2018-09-28
17   52686  budgeted 2018-09-28
18   49612    actual 2018-10-31
19   63384  budgeted 2018-10-31
20   49272    actual 2018-11-30
21   74107  budgeted 2018-11-30
22   83653  budgeted 2018-12-31
23   56907    actual 2018-12-31

但是,您可能更喜欢这种表示形式:

df3 = (pd.concat([budgetedDF, actualDF])
         .drop('months', 1)
         .set_index('category', append=True)
         .unstack()
)

df3
           amount         
category   actual budgeted
2018-01-31   3612     2183
2018-02-28   3357     8902
2018-03-30   2828     9956
2018-04-30   2990    14475
2018-05-31   4446    25385
2018-06-29  19119    29119
2018-07-31  27296    40869
2018-08-31  38443    43400
2018-09-28  47978    52686
2018-10-31  49612    63384
2018-11-30  49272    74107
2018-12-31  56907    83653

答案 1 :(得分:0)

您想要的声音

pd.concat([budgetedDF, actualDF]).sort_values('months').reset_index(drop=True)