我想计算window_length为no的每年(_APMC,Commodity)分组的modal_price的滚动平均值。那年的几个月。根据我的解决方案,我得到了Nan的全部。数据集如下:
APMC | Commodity | qtl _weight| min_price | max_price | modal_price | district_name | Year | Month
date
2014-12-01 Akole bajri 40 1375 1750 1563 Ahmadnagar 2014 12
2014-12-01 Akole paddy-unhusked 346 1400 1800 1625 Ahmadnagar 2014 12
2014-12-01 Akole wheat 55 1500 1900 1675 Ahmadnagar 2014 12
2014-12-01 Akole bhagar/vari 59 2000 2600 2400 Ahmadnagar 2014 12
2014-12-01 Akole gram 9 3200 3300 3235 Ahmadnagar 2014 12
2014-12-01 Jamkhed cotton 44199 3950 4033 3991 Ahmadnagar 2014 12
2014-12-01 Jamkhed bajri 846 1300 1488 1394 Ahmadnagar 2014 12
2014-12-01 Jamkhed wheat(husked) 155 1879 2231 2055 Ahmadnagar 2014 12
2014-12-01 Kopar gram 421 1983 2698 2463 Ahmadnagar 2014 12
2014-12-01 Kopar greengram 18 6734 7259 6759 Ahmadnagar 2014 12
2014-12-01 Kopar soybean 1507 2945 3247 3199 Ahmadnagar 2014 12
2016-11-01 Sanga wheat(husked) 222 1730 2173 1994 Ahmadnagar 2016 11
每个APMC有6万行,商品集群的编号不同。三年(2014年,2015年,2016年)的月数。
答案 0 :(得分:0)
我不知道您是否需要分组,但您可以这样做:
out = {}
for APMC_ in df.APMC.unique():
for Commodity_ in df.Commodity.unique():
for year_ in set(df.index.year):
temp = df[(df.APMC==APMC_) & (df.Commodity==Commodity_) & (df.index.year==year_) ].copy()
n_months = temp.shape[0]...
out[APMC_ + Commodity_ + str(year)] = temp.mean() # or whatever
但是键入此命令后,我觉得您的“该年的月份数”可能不正确。
无论如何,这并不是您所要的,但可以解决您的问题。
答案 1 :(得分:0)
我认为您想将每年x APMC x商品组成组,然后使用.expanding().mean()
计算每个组的滚动平均值。由于您的数据似乎是每月一次,因此这将是每个月的滚动平均值。
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'Date': ['2014-11-01','2014-11-01','2014-11-01',
'2014-12-01','2014-12-01','2014-12-01',
'2015-01-01','2015-01-01','2015-01-01'],
'APMC': np.tile(['Akole', 'Jamkhed', 'Kopar'], 3),
'Commodity': np.tile(['wheat', 'cotton', 'gram'], 3),
'modal_price': np.random.randint(1000,2000,9)})
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
# APMC Commodity modal_price
#Date
#2014-11-01 Akole wheat 1510
#2014-11-01 Jamkhed cotton 1365
#2014-11-01 Kopar gram 1382
#2014-12-01 Akole wheat 1322
#2014-12-01 Jamkhed cotton 1988
#2014-12-01 Kopar gram 1098
#2015-01-01 Akole wheat 1742
#2015-01-01 Jamkhed cotton 1017
#2015-01-01 Kopar gram 1595
df = df.sort_index()
df.assign(Year=df.index.year).groupby(['Year', 'APMC', 'Commodity']).modal_price.expanding().mean()
Year APMC Commodity Date
2014 Akole wheat 2014-11-01 1510.0
2014-12-01 1416.0
Jamkhed cotton 2014-11-01 1365.0
2014-12-01 1676.5
Kopar gram 2014-11-01 1382.0
2014-12-01 1240.0
2015 Akole wheat 2015-01-01 1742.0
Jamkhed cotton 2015-01-01 1017.0
Kopar gram 2015-01-01 1595.0
Name: modal_price, dtype: float64
由于输出具有原始DataFrame
的索引,因此您可以根据需要加入结果。