我有一个交易数据框(15k行):
customer_id order_id order_date var1 var2 product_id \
79 822067 1990-10-21 0 0 51818
79 771456 1990-11-29 0 0 580866
79 771456 1990-11-29 0 0 924147
156 720709 1990-06-08 0 0 167205
156 720709 1990-06-08 0 0 132120
product_type_id designer_id gross_spend net_spend
139 322 0.174 0.174
139 2366 1.236 1.236
432 919 0.205 0.205
474 4792 0.374 0.374
164 2243 0.278 0.278
我想按每个客户的product_type_id
和交易时间分组进行分组。为了更清楚每个customer_id
,我想知道客户在过去的30,60,90,120,150,180,360天内从同一类别购买了多少次(从1991年开始)例如-01-01)。
对于每个客户,我也想知道他已经购买了多少次购买,从他购买了多少不同的product_type_id总net_spend。
我不清楚如何将数据减少为平坦的pandas数据框,每customer_id
行一行....
我可以用以下内容简化视图:
transactions['order_date'] = transactions['order_date'].apply(lambda x: dt.datetime.strptime(x,"%Y-%m-%d"))
NOW = dt.datetime(1991,01,01)
Table = transactions.groupby('customer_id').agg({ 'order_date': lambda x: (NOW - x.max()).days,'order_id': lambda x: len(set(x)), 'net_spend': lambda x: x.sum()})
Table.rename(columns={'order_date': 'Recency', 'order_id': 'Frequency', 'net_spend': 'Monetization'}, inplace=True)
答案 0 :(得分:0)
使用:
date = '1991-01-01'
last = [30,60,90]
#get all last datetimes shifted by last
a = [pd.to_datetime(date)- pd.Timedelta(x, unit='d') for x in last]
d1 = {}
#create new columns by conditions with between
for i, x in enumerate(a):
df['last_' + str(last[i])] = df['order_date'].between(x, date).astype(int)
#create dictionary for aggregate
d1['last_' + str(last[i])] = 'sum'
#aggregating dictionary
d = {'customer_id':'size', 'product_type_id':'nunique', 'net_spend':'sum'}
#add d1 to d
d.update(d1)
print (d)
{'product_type_id': 'nunique', 'last_30': 'sum', 'net_spend': 'sum',
'last_60': 'sum', 'customer_id': 'size', 'last_90': 'sum'}
df1 = df.groupby('customer_id').agg(d)
#change order of columns if necessary
cs = df1.columns
m = cs.str.startswith('last')
cols = cs[~m].tolist() + cs[m].tolist()
df1 = df1.reindex(columns=cols)
print (df1)
product_type_id net_spend customer_id last_30 last_60 \
customer_id
79 2 1.615 3 0 2
156 2 0.652 2 0 0
last_90
customer_id
79 3
156 0