Question

我有一个数据框架，其中包含每位用户的天数和下载量：

dates           downloadsperuser
2004-01-02  12.51118760757315
2004-01-03  6.990049751243781
2004-01-04  6.8099547511312215
2004-01-05  22.513349514563107
2004-01-06  22.348538011695908
2004-01-07  23.895180722891567
2004-01-08  21.765680473372782
2004-01-09  20.34256926952141
2004-01-10  9.455938697318008
...
2004-02-01  9.196078431372548
2004-02-02  21.558398220244715
2004-02-03  22.293007769145394
2004-02-04  22.324115044247787
2004-02-05  21.88482834994463
2004-02-06  20.236781609195404
2004-02-07  10.708823529411765
2004-02-08  10.835329341317365
2004-02-09  24.87350054525627
2004-02-10  24.167035398230087
2004-02-11  22.676117775354417
2004-02-12  23.384444444444444
2004-02-13  20.674285714285713
2004-02-14  10.74914089347079
2004-02-15  11.64873417721519
...
2004-03-01  23.36965811965812
2004-03-02  23.127545551982852
2004-03-03  23.60235798499464
2004-03-04  23.634015069967706
2004-03-05  20.468996617812852
2004-03-06  6.608208955223881
2004-03-07  5.570446735395189
2004-03-08  23.48093220338983
2004-03-09  25.734190782422292
2004-03-10  24.919652551574377
...

我想计算平均平均值。到目前为止，我尝试过：

df = pd.read_csv('downloadsperuser.csv', parse_dates=True)
df['dates']=pd.to_datetime(df['dates'])
df['month'] = pd.PeriodIndex(df.dates, freq='M')
df['month'].value_counts().sort_index()

并成为一天中的月份。但是我不知道如何每月汇总downloadsperuser列中的所有值。

Answer 1

您可以尝试：

# test input
set.seed(123)
x <- sample(20, 20)
d <- c(.2, .3, .5) # assume in increasing order

o <- order(x)
b <- findInterval(cumsum(d) * sum(x), cumsum(x[o]))
g <- rep(seq_along(d), diff(c(0, b)))[order(o)]

# check distribution of result
tapply(x, g, sum) / sum(x)
##         1         2         3 
## 0.1714286 0.3285714 0.5000000

Answer 2

首先计算月份和年份，然后分组依据以找到均值：

df['month'] = pd.to_datetime(df['date']).dt.month
df['year'] = pd.to_datetime(df['date']).dt.year
df.groupby(['year','month'],as_index=False).mean()

如何使用python计算月平均值？

2 个答案: