熊猫群分析:每月一次,每季/每学期/每年

时间:2018-07-13 11:30:07

标签: python pandas dataframe

我是使用Python刚接触Panda的人,现在陷入困境。我从事队列分析。我成功打印出了同类群组的每月用户保留量,但我真的不知道如何按季度统计活跃用户。

我的每月队列具有以下数据框,效果很好:

df_cohort_month = pd.DataFrame({ 'client' :id_client, 'charge' : charge, 'month_date_charge' : quarter_date_charge, 'active_monthly_user': active_monthly_user, 'user_applying_date' : user_applying_date})

print(df_cohort_month.head())

client charge month_date_charge active_user 0 105 1.0 2015-07 1 1 105 0.0 2015-08 0 2 105 0.0 2015-09 0 3 105 0.0 2015-10 0 4 105 0.0 2015-11 0

这是我的代码以打印每月队列:

df_cohort_month['Cohort_Group_month'] = df_cohort_month.date_etat_termine.apply(lambda x: x.strftime('%Y-%m'))
cohorts = df_cohort_month.groupby(['Cohort_Group_mois', 'date_transaction_mois']).agg({
                   'charge' : np.sum,
                   'client' : pd.Series.nunique,
                   'active_monthly_user' : np.sum }`

def cohort_period(df_cohort_month):
       df_cohort_month['Cohort_period'] = np.arange(len(df_cohort_month)) + 1
       return df_cohort_month`

cohorts = cohorts.groupby(level=0).apply(cohort_period)

cohorts.reset_index(inplace=True)
cohorts.set_index(['Cohort_Group_month','Cohort_period'], inplace=True)
cohorts.rename(columns={'client': 'Newclients'}, inplace=True)
print(cohorts['active_monthly_user'].head(10).unstack(0))

`

Cohort_Group_mois 2015-07 2015-08 2015-09 ... 2018-04 2018-05 2018-06 Cohort_period ...
1 1.0 1.0 0.0 ... 1.0 1.0 1.0 2 0.0 1.0 0.0 ... 1.0 1.0 NaN 3 0.0 1.0 0.0 ... 1.0 NaN NaN 4 0.0 1.0 0.0 ... NaN NaN NaN 5 0.0 1.0 0.0 ... NaN NaN NaN 6 1.0 1.0 0.0 ... NaN NaN NaN 7 0.0 1.0 0.0 ... NaN NaN NaN 8 1.0 1.0 0.0 ... NaN NaN NaN 9 0.0 1.0 0.0 ... NaN NaN NaN 10 1.0 1.0 0.0 ... NaN NaN NaN

我为四分之一人群创建了第二个:

df_cohort_quarter = pd.DataFrame({ 'client' :id_client, 'charge' : transaction, 'quarter_date_charge' : quarter_date_charge, 'active_monthly_user': active_monthly_user, 'user_applying_date' : user_applying_date, })

`df_cohort_quarter['Cohort_Group_quarter'] = df_cohort_quarter.user_applying_date.apply(lambda x: str(x.year)+'Q'+str(x.quarter))
 df_cohort_quarter['quarter_date_charge'] = df_cohort_quarter.date_transaction_quarter.apply(lambda x: str(x.year)+'Q'+str(x.quarter))
 print(df_cohorte_quarter.head(5))

`

`   client          ...           Cohort_Group_quarter
 0     105          ...                         2015Q3
 1     105          ...                         2015Q3
 2     105          ...                         2015Q3
 3     105          ...                         2015Q3
 4     105          ...                         2015Q3`

我正在寻找创建一个订单列名称“ active_quarter_user”,该列使用“ active_monthly_user”仅计算用户,但我真的不知道该怎么做。

我尝试了一些带有循环和条件的想法,但是它没有用,并且花费了大量时间。

0 个答案:

没有答案