我正在尝试对DF进行重新采样,以获取百分比变化带来的年度数据。
这是我的数据框。
data = {'year': ['2000', '2000', '2003', '2003', '2005', '2005'],
'country':['UK', 'US', 'UK','US','UK','US'],
'sales': [0, 10, 30, 25, 40, 45],
'cost': [0, 100, 300, 250, 400, 450]
}
df=pd.DataFrame(data)
dfL=df.copy()
dfL.year=dfL.year.astype('str') + '-01-01 00:00:00.00000'
dfL.year=pd.to_datetime(dfL.year)
dfL=dfL.set_index('year')
dfL
country sales cost
year
2000-01-01 UK 0 0
2000-01-01 US 10 100
2003-01-01 UK 30 300
2003-01-01 US 25 250
2005-01-01 UK 40 400
2005-01-01 US 55 550
我想得到类似下面的输出。
country sales cost
year
2000-01-01 UK 0 0
2001-01-01 UK 10 100
2002-01-01 UK 20 200
2003-01-01 UK 30 300
2004-01-01 UK 35 350
2005-01-01 UK 40 400
2000-01-01 US 10 100
2001-01-01 US 15 150
2002-01-01 US 20 200
2003-01-01 US 25 250
2004-01-01 US 35 350
2005-01-01 US 45 450
我希望每年需要重新采样一次。但不太确定要使用的apply函数。 有谁可以帮忙吗?
答案 0 :(得分:3)
使用resample
+ interpolate
并重塑方法stack
和unstack
dfL=dfL.set_index('country',append=True).unstack().resample('YS').interpolate().stack().reset_index(level=1)
dfL
Out[309]:
country cost sales
year
2000-01-01 UK 0.0 0.0
2000-01-01 US 100.0 10.0
2001-01-01 UK 100.0 10.0
2001-01-01 US 150.0 15.0
2002-01-01 UK 200.0 20.0
2002-01-01 US 200.0 20.0
2003-01-01 UK 300.0 30.0
2003-01-01 US 250.0 25.0
2004-01-01 UK 350.0 35.0
2004-01-01 US 350.0 35.0
2005-01-01 UK 400.0 40.0
2005-01-01 US 450.0 45.0
答案 1 :(得分:1)
我将使用数据透视表执行此操作,然后重新采样:
In [11]: res = dfL.pivot_table(index="year", columns="country", values=["sales", "cost"])
In [12]: res
Out[12]:
cost sales
country UK US UK US
year
2000-01-01 0 100 0 10
2003-01-01 300 250 30 25
2005-01-01 400 450 40 45
In [13]: res.resample("YS").interpolate()
Out[13]:
cost sales
country UK US UK US
year
2000-01-01 0.0 100.0 0.0 10.0
2001-01-01 100.0 150.0 10.0 15.0
2002-01-01 200.0 200.0 20.0 20.0
2003-01-01 300.0 250.0 30.0 25.0
2004-01-01 350.0 350.0 35.0 35.0
2005-01-01 400.0 450.0 40.0 45.0
我个人将其保留为这种格式,但是如果您想将其堆叠回去,则可以堆叠和reset_index:
In [14]: res.resample("YS").interpolate().stack(level=1).reset_index(level=1)
Out[14]:
country cost sales
year
2000-01-01 UK 0.0 0.0
2000-01-01 US 100.0 10.0
2001-01-01 UK 100.0 10.0
2001-01-01 US 150.0 15.0
2002-01-01 UK 200.0 20.0
2002-01-01 US 200.0 20.0
2003-01-01 UK 300.0 30.0
2003-01-01 US 250.0 25.0
2004-01-01 UK 350.0 35.0
2004-01-01 US 350.0 35.0
2005-01-01 UK 400.0 40.0
2005-01-01 US 450.0 45.0