我有这个时间序列数据,现在我想使用' modal_price '为APMC和商品的每个群集计算趋势季节性类型(乘性或加性)。数据集大约有60,000个这样的行,其中APMC和Cluster相同,但日期更改。数据集如下:
APMC | Commodity | qtl _weight| min_price | max_price | modal_price | district_name | Year | Month
date
2014-12-01 Akole bajri 40 1375 1750 1563 Ahmadnagar 2014 12
2014-12-01 Akole paddy-unhusked 346 1400 1800 1625 Ahmadnagar 2014 12
2014-12-01 Akole wheat 55 1500 1900 1675 Ahmadnagar 2014 12
2014-12-01 Akole bhagar/vari 59 2000 2600 2400 Ahmadnagar 2014 12
2014-12-01 Akole gram 9 3200 3300 3235 Ahmadnagar 2014 12
2014-12-01 Jamkhed cotton 44199 3950 4033 3991 Ahmadnagar 2014 12
2014-12-01 Jamkhed bajri 846 1300 1488 1394 Ahmadnagar 2014 12
2014-12-01 Jamkhed wheat(husked) 155 1879 2231 2055 Ahmadnagar 2014 12
2014-12-01 Kopar gram 421 1983 2698 2463 Ahmadnagar 2014 12
2014-12-01 Kopar greengram 18 6734 7259 6759 Ahmadnagar 2014 12
2014-12-01 Kopar soybean 1507 2945 3247 3199 Ahmadnagar 2014 12
2016-11-01 Sanga wheat(husked) 222 1730 2173 1994 Ahmadnagar 2016 11
现在,我尝试使用(APMC,商品和日期作为索引)对此进行数据透视表设置,但是这无助于计算每个聚类(APMC,商品)的均值(以计算趋势)。我只需要知道如何使用'modal_price'和将其作为dataframe / pivot-table中的COLUMN列来计算每个聚类(APMC,Commodity)的均值。
答案 0 :(得分:0)
也许groupby将为您提供趋势所需的信息,然后进行转换将使您能够将其投影回相同的索引? 像这样:
# group by your cluster
g = df.groupby(["Year", "APMC", "Commodity"])
# determine the trend per cluster but finalise back into original diimensions
trend = g.modal_price.transform(lambda x: x.mean())
df["trend"] = trend