切片多索引列数据框以获得新的数据框

时间:2018-06-23 17:14:34

标签: python pandas dataframe slice multi-index

import pandas as pd
import string

from random import randint

months                  = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts           = [ "actual", "budgeted", "difference" ]

summary = []

summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )

index   = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )

summaryDF = pd.DataFrame( summary, index = index, columns = columns )

budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )

idx = pd.IndexSlice
budgetDifference = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
budgetActual     = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()

我想要的是一个数据框,该数据框仅包含每月差异行的实际列和预算列,另外一个包含月份的列(我最终需要这个额外的列来生成图形)

如果我只是这样做:

budgetDifference = pd.DataFrame( { 'difference' : budgetDifference, 'months' : budgetMonths } )

我最终得到的是一个带有差异和月份列的数据框。

                    difference  months
month   category        
Jan budgeted        1097        2018-01-31
Feb budgeted        11476       2018-02-28
Mar budgeted        11143       2018-03-30
Apr budgeted        25082       2018-04-30
May budgeted        28019       2018-05-31
Jun budgeted        37164       2018-06-29
Jul budgeted        36747       2018-07-31
Aug budgeted        44651       2018-08-31
Sep budgeted        54283       2018-09-28
Oct budgeted        62728       2018-10-31
Nov budgeted        76144       2018-11-30
Dec budgeted        77781       2018-12-31

但是,当我尝试时:

budgetDifference = pd.DataFrame( { 'difference' : budgetDifference, 'actual' : budgetActual, 'months' : budgetMonths } )

我得到:

ValueError: array length 12 does not match index length 24

我不确定为什么。

1 个答案:

答案 0 :(得分:1)

您需要对齐构成数据框的系列的索引:

res = pd.DataFrame({'difference': budgetDifference,
                    'months': budgetMonths,
                    'actual': pd.Series(budgetActual.values, index=budgetDifference.index)})

print(res)

                difference     months  actual
month category                               
Jan   budgeted        4057 2018-01-31    1592
Feb   budgeted        4550 2018-02-28    2211
Mar   budgeted        3847 2018-03-30    4096
Apr   budgeted       12970 2018-04-30    9588
May   budgeted       17459 2018-05-31   19623
Jun   budgeted       30884 2018-06-29   32347
Jul   budgeted       35258 2018-07-31   37205
Aug   budgeted       35823 2018-08-31   50234
Sep   budgeted       47599 2018-09-28   57188
Oct   budgeted       61258 2018-10-31   71096
Nov   budgeted       65914 2018-11-30   71904
Dec   budgeted       73814 2018-12-31   77308