熊猫随着百分比变化重新采样

时间:2019-05-22 03:13:33

标签: pandas

我正在尝试对DF进行重新采样,以获取百分比变化带来的年度数据。

这是我的数据框。

data = {'year': ['2000', '2000', '2003', '2003', '2005', '2005'],
    'country':['UK', 'US', 'UK','US','UK','US'],
    'sales': [0, 10, 30, 25, 40, 45],
    'cost': [0, 100, 300, 250, 400, 450]
    }
df=pd.DataFrame(data)
dfL=df.copy()
dfL.year=dfL.year.astype('str') + '-01-01 00:00:00.00000'
dfL.year=pd.to_datetime(dfL.year)
dfL=dfL.set_index('year')
dfL

    country sales   cost
year            
2000-01-01  UK  0   0
2000-01-01  US  10  100
2003-01-01  UK  30  300
2003-01-01  US  25  250
2005-01-01  UK  40  400
2005-01-01  US  55  550

我想得到类似下面的输出。

    country sales   cost
year            
2000-01-01  UK  0   0
2001-01-01  UK  10  100
2002-01-01  UK  20  200
2003-01-01  UK  30  300
2004-01-01  UK  35  350
2005-01-01  UK  40  400
2000-01-01  US  10  100
2001-01-01  US  15  150
2002-01-01  US  20  200
2003-01-01  US  25  250
2004-01-01  US  35  350
2005-01-01  US  45  450

我希望每年需要重新采样一次。但不太确定要使用的apply函数。 有谁可以帮忙吗?

2 个答案:

答案 0 :(得分:3)

使用resample + interpolate并重塑方法stackunstack

dfL=dfL.set_index('country',append=True).unstack().resample('YS').interpolate().stack().reset_index(level=1)
dfL
Out[309]: 
           country   cost  sales
year                            
2000-01-01      UK    0.0    0.0
2000-01-01      US  100.0   10.0
2001-01-01      UK  100.0   10.0
2001-01-01      US  150.0   15.0
2002-01-01      UK  200.0   20.0
2002-01-01      US  200.0   20.0
2003-01-01      UK  300.0   30.0
2003-01-01      US  250.0   25.0
2004-01-01      UK  350.0   35.0
2004-01-01      US  350.0   35.0
2005-01-01      UK  400.0   40.0
2005-01-01      US  450.0   45.0

答案 1 :(得分:1)

我将使用数据透视表执行此操作,然后重新采样:

In [11]: res = dfL.pivot_table(index="year", columns="country", values=["sales", "cost"])

In [12]: res
Out[12]:
           cost      sales
country      UK   US    UK  US
year
2000-01-01    0  100     0  10
2003-01-01  300  250    30  25
2005-01-01  400  450    40  45

In [13]: res.resample("YS").interpolate()
Out[13]:
             cost        sales
country        UK     US    UK    US
year
2000-01-01    0.0  100.0   0.0  10.0
2001-01-01  100.0  150.0  10.0  15.0
2002-01-01  200.0  200.0  20.0  20.0
2003-01-01  300.0  250.0  30.0  25.0
2004-01-01  350.0  350.0  35.0  35.0
2005-01-01  400.0  450.0  40.0  45.0

我个人将其保留为这种格式,但是如果您想将其堆叠回去,则可以堆叠和reset_index:

In [14]: res.resample("YS").interpolate().stack(level=1).reset_index(level=1)
Out[14]:
           country   cost  sales
year
2000-01-01      UK    0.0    0.0
2000-01-01      US  100.0   10.0
2001-01-01      UK  100.0   10.0
2001-01-01      US  150.0   15.0
2002-01-01      UK  200.0   20.0
2002-01-01      US  200.0   20.0
2003-01-01      UK  300.0   30.0
2003-01-01      US  250.0   25.0
2004-01-01      UK  350.0   35.0
2004-01-01      US  350.0   35.0
2005-01-01      UK  400.0   40.0
2005-01-01      US  450.0   45.0