给出一个df
session_id article session_type
1 a req
1 b req
1 null action
2 home req
2 h req
2 j req
2 home req
3 home req
3 home req
3 r req
3 home req
我想汇总到一栏,作为以下内容的一项意见: 1.独特的文章 2.唯一的session_type 3.计算所有不连续的房屋
输出:
sess_id agg_col
1 {unique_articles:2,unique_promotion_session:2,non_consectutive_home:0}
2 {unique_articles:2,unique_promotion_session:1,non_consectutive_home:2}
3 {unique_articles:1,unique_promotion_session:1,non_consectutive_home:1}
谢谢。
答案 0 :(得分:1)
使用:
#compare home to mask
m = df['article'].eq('home')
#create consecutive groups and filter only by mask home groups
s = m.ne(m.groupby(df['session_id']).shift()).cumsum()[m]
#counts number of groups, compare by 1 for unique home groups
df['home'] = s.map(s.value_counts()).eq(1).astype(int)
#repalce home and null to NaNs for omit this values
df['article'] = df['article'].mask(m | df['article'].eq('null'))
df['home'] = df['home'].fillna(0).astype(int)
#aggregtae number of unique values with omit NaNs and sum for count 1
df1 = df.groupby('session_id').agg({'article':'nunique',
'session_type':'nunique',
'home':'sum'})
df1 = df1.rename(columns={'article':'unique_articles',
'session_type':'unique_promotion_session',
'home':'non_consectutive_home'})
print (df1)
unique_articles unique_promotion_session non_consectutive_home
session_id
1 2 2 0
2 2 1 2
3 1 1 1
#create DaatFrame filled by dicts
d = df1.to_dict('index')
df2 = pd.DataFrame({'sess_id': list(d.keys()),
'agg_col': list(d.values())})
print (df2)
sess_id agg_col
0 1 {'unique_articles': 2, 'unique_promotion_sessi...
1 2 {'unique_articles': 2, 'unique_promotion_sessi...
2 3 {'unique_articles': 1, 'unique_promotion_sessi...