我有一个如下的pandas数据框,显示石油产品轻柴油的最小,最大和平均销售量,我想从中生成显示5年间隔(例如2010-2014)的石油产品的最小,最大和平均销售量的数据框, 2015-2019,..等等,其中包括结束年份。
假设以下数据框的名称为“ lightdiesel_df”
petroleum_product year max_sale min_sale avg_sale
0 Light Diesel Oil 2014 0 0 0.0
1 Light Diesel Oil 2013 0 0 0.0
2 Light Diesel Oil 2012 258 258 258.0
3 Light Diesel Oil 2011 0 0 0.0
4 Light Diesel Oil 2010 227 227 227.0
5 Light Diesel Oil 2009 238 238 238.0
6 Light Diesel Oil 2008 377 377 377.0
7 Light Diesel Oil 2007 306 306 306.0
8 Light Diesel Oil 2006 179 179 179.0
9 Light Diesel Oil 2005 290 290 290.0
10 Light Diesel Oil 2004 88 88 88.0
11 Light Diesel Oil 2003 577 577 577.0
12 Light Diesel Oil 2002 610 610 610.0
13 Light Diesel Oil 2001 2413 2413 2413.0
14 Light Diesel Oil 2000 3416 3416 3416.0
因此,基本上,我希望将以下输出作为:
petroleum_product year min_sale max_sale avg_sale
Light Diesel Oil 2010-2014 227 258 242.5
Light Diesel Oil 2005-2009 179 377 278
Light Diesel Oil 2000-2004 88 3416 1420.8
答案 0 :(得分:2)
尝试使用Grouper
传递频率(5年)和参数 closed ='left',如下所示:
df2['year'] = pd.to_datetime(df2['year'], format = '%Y')
(df2.groupby(['petroleum_product', pd.Grouper(key = 'year', freq = '5Y', closed = 'left')])
.agg(
{'year': lambda x: '-'.join((str(min(x.dt.year)), str(max(x.dt.year)))),
'max_sale' : 'max',
'min_sale' : 'min',
'avg_sale' : 'mean'
}).reset_index(level= 0).reset_index(drop=True)
)
#output:
petroleum_product year max_sale min_sale avg_sale
0 Light Diesel Oil 2000-2004 3416 88 1420.8
1 Light Diesel Oil 2005-2009 377 179 278.0
2 Light Diesel Oil 2010-2014 258 0 97.0
答案 1 :(得分:1)
您还可以在year
列和labels
中创建垃圾箱以根据预期的输出进行格式化后,尝试使用pd.cut
:
bins=[*range(df['year'].min(),df['year'].max()+5)][::5]
#output : [2000, 2005, 2010, 2015]
labels=[f"{a}-{b-1}" for a,b in zip(bins,bins[1::])]
#output: ['2000-2004', '2005-2009', '2010-2014']
s=pd.cut(df['year'],bins,labels=labels,include_lowest=True,right=False)
final=(df.assign(year=s).groupby(['petroleum_product','year'],sort=False,as_index=False)
.agg({'max_sale':'max', 'min_sale':'min','avg_sale':'mean'}))
petroleum_product year max_sale min_sale avg_sale
0 Light Diesel Oil 2010-2014 3416 88 1420.8
1 Light Diesel Oil 2005-2009 377 179 278.0
2 Light Diesel Oil 2000-2004 258 0 97.0
答案 2 :(得分:0)
请尝试
pd.cut用于在特定范围内分割df
df['year_range']=pd.cut(df.year, [1999,2004,2009,2015])
df_res=df.groupby(['petroleum_product','year_range']).agg({'max_sale':'max',
'min_sale':'min','avg_sale':'mean'})