Question

我有一个交易数据框（15k行）：

customer_id  order_id  order_date  var1  var2  product_id  \
         79    822067  1990-10-21     0     0       51818
         79    771456  1990-11-29     0     0      580866
         79    771456  1990-11-29     0     0      924147
        156    720709  1990-06-08     0     0      167205
        156    720709  1990-06-08     0     0      132120

     product_type_id  designer_id  gross_spend  net_spend
                 139          322        0.174      0.174
                 139         2366        1.236      1.236
                 432          919        0.205      0.205
                 474         4792        0.374      0.374
                 164         2243        0.278      0.278

我想按每个客户的product_type_id和交易时间分组进行分组。为了更清楚每个customer_id，我想知道客户在过去的30,60,90,120,150,180,360天内从同一类别购买了多少次（从1991年开始）例如-01-01）。

对于每个客户，我也想知道他已经购买了多少次购买，从他购买了多少不同的product_type_id总net_spend。

我不清楚如何将数据减少为平坦的pandas数据框，每customer_id行一行....

我可以用以下内容简化视图：

transactions['order_date'] = transactions['order_date'].apply(lambda x: dt.datetime.strptime(x,"%Y-%m-%d"))

NOW = dt.datetime(1991,01,01)

Table = transactions.groupby('customer_id').agg({ 'order_date': lambda x: (NOW - x.max()).days,'order_id': lambda x: len(set(x)), 'net_spend': lambda x: x.sum()})

Table.rename(columns={'order_date': 'Recency', 'order_id': 'Frequency', 'net_spend': 'Monetization'}, inplace=True)

Answer 1

使用：

date = '1991-01-01'
last = [30,60,90]

#get all last datetimes shifted by last
a = [pd.to_datetime(date)- pd.Timedelta(x, unit='d') for x in last]


d1 = {}
#create new columns by conditions with between
for i, x in enumerate(a):
    df['last_' + str(last[i])] = df['order_date'].between(x, date).astype(int)
    #create dictionary for aggregate
    d1['last_' + str(last[i])] = 'sum' 

#aggregating dictionary
d = {'customer_id':'size', 'product_type_id':'nunique', 'net_spend':'sum'}
#add d1 to d
d.update(d1)
print (d)
{'product_type_id': 'nunique', 'last_30': 'sum', 'net_spend': 'sum', 
 'last_60': 'sum', 'customer_id': 'size', 'last_90': 'sum'}

df1 = df.groupby('customer_id').agg(d)

#change order of columns if necessary    
cs = df1.columns
m = cs.str.startswith('last')
cols = cs[~m].tolist() + cs[m].tolist()

df1 = df1.reindex(columns=cols)

print (df1)
             product_type_id  net_spend  customer_id  last_30  last_60  \
customer_id                                                              
79                         2      1.615            3        0        2   
156                        2      0.652            2        0        0   

             last_90  
customer_id           
79                 3  
156                0

交易数据分析

1 个答案: