我是使用Python刚接触Panda的人,现在陷入困境。我从事队列分析。我成功打印出了同类群组的每月用户保留量,但我真的不知道如何按季度统计活跃用户。
我的每月队列具有以下数据框,效果很好:
df_cohort_month = pd.DataFrame({ 'client' :id_client,
'charge' : charge,
'month_date_charge' : quarter_date_charge,
'active_monthly_user': active_monthly_user,
'user_applying_date' : user_applying_date})
print(df_cohort_month.head())
client charge month_date_charge active_user
0 105 1.0 2015-07 1
1 105 0.0 2015-08 0
2 105 0.0 2015-09 0
3 105 0.0 2015-10 0
4 105 0.0 2015-11 0
这是我的代码以打印每月队列:
df_cohort_month['Cohort_Group_month'] = df_cohort_month.date_etat_termine.apply(lambda x: x.strftime('%Y-%m'))
cohorts = df_cohort_month.groupby(['Cohort_Group_mois', 'date_transaction_mois']).agg({
'charge' : np.sum,
'client' : pd.Series.nunique,
'active_monthly_user' : np.sum }`
def cohort_period(df_cohort_month):
df_cohort_month['Cohort_period'] = np.arange(len(df_cohort_month)) + 1
return df_cohort_month`
cohorts = cohorts.groupby(level=0).apply(cohort_period)
cohorts.reset_index(inplace=True)
cohorts.set_index(['Cohort_Group_month','Cohort_period'], inplace=True)
cohorts.rename(columns={'client': 'Newclients'}, inplace=True)
print(cohorts['active_monthly_user'].head(10).unstack(0))
`
Cohort_Group_mois 2015-07 2015-08 2015-09 ... 2018-04 2018-05 2018-06
Cohort_period ...
1 1.0 1.0 0.0 ... 1.0 1.0 1.0
2 0.0 1.0 0.0 ... 1.0 1.0 NaN
3 0.0 1.0 0.0 ... 1.0 NaN NaN
4 0.0 1.0 0.0 ... NaN NaN NaN
5 0.0 1.0 0.0 ... NaN NaN NaN
6 1.0 1.0 0.0 ... NaN NaN NaN
7 0.0 1.0 0.0 ... NaN NaN NaN
8 1.0 1.0 0.0 ... NaN NaN NaN
9 0.0 1.0 0.0 ... NaN NaN NaN
10 1.0 1.0 0.0 ... NaN NaN NaN
我为四分之一人群创建了第二个:
df_cohort_quarter = pd.DataFrame({ 'client' :id_client,
'charge' : transaction,
'quarter_date_charge' : quarter_date_charge,
'active_monthly_user': active_monthly_user,
'user_applying_date' : user_applying_date,
})
`df_cohort_quarter['Cohort_Group_quarter'] = df_cohort_quarter.user_applying_date.apply(lambda x: str(x.year)+'Q'+str(x.quarter))
df_cohort_quarter['quarter_date_charge'] = df_cohort_quarter.date_transaction_quarter.apply(lambda x: str(x.year)+'Q'+str(x.quarter))
print(df_cohorte_quarter.head(5))
`
` client ... Cohort_Group_quarter
0 105 ... 2015Q3
1 105 ... 2015Q3
2 105 ... 2015Q3
3 105 ... 2015Q3
4 105 ... 2015Q3`
我正在寻找创建一个订单列名称“ active_quarter_user”,该列使用“ active_monthly_user”仅计算用户,但我真的不知道该怎么做。
我尝试了一些带有循环和条件的想法,但是它没有用,并且花费了大量时间。