我正在解决这个看起来很简单的问题。数据如下:
例如,客户ID的当前事件ID为abc。现在,我需要以列表格式查找所有客户的所有eventId,以便记录第一个事件ID,然后记录下一个事件,直到最新的事件ID。
我为1位客户使用的方法如下:
import pandas as pd
import numpy as np
data = pd.read_csv('test.csv')
data.to_dict()
{'customerid': {0: 233, 1: 250, 2: 233, 3: 250, 4: 233},
'eventid': {0: 'abc', 1: 'bcd', 2: 'edc', 3: 'abl', 4: 'cdl'},
'date': {0: '2019-12-10',
1: '2019-12-08',
2: '2008-12-10',
3: '2019-12-01',
4: '2001-12-10'},
'previouseventid': {0: 'edc', 1: 'abl', 2: 'cdl', 3: np.nan, 4: np.nan}}
customerid eventid date previouseventid
0 233 abc 2019-12-10 edc
1 250 bcd 2019-12-08 abl
2 233 edc 2008-12-10 cdl
3 250 abl 2019-12-01
4 233 cdl 2001-12-10
temp = [cust_233['eventid'][0]]
for i in range(len(cust_233['previouseventid'])-1):
if pd.isna(cust_233['previouseventid'][i]) == False:
# print(cust_233['previouseventid'][i])
temp.append(cust_233['previouseventid'][i])
else:
# print('now exiting')
break
我觉得我的方法有点笨拙,并且有很多代码。如何为所有客户有效地解决问题?
已更新:
我需要的输出是列表。 客户233的预期输出是列表['cdl','edc','abc'],客户250的预期输出是['abl','bcd']
答案 0 :(得分:4)
Groupby然后移动应该可以:
# First, make sure your data is sorted from oldest to newest
df['date'] = pd.to_datetime(df['date'])
df.sort_values('date', inplace=True)
# Get previous event through groupby operation
df['prev_id'] = df.groupby('customerid')['eventid'].shift(1)
如果要为每个客户提供清单:
# create a dictionary with stored values – keys are customer id
prev_events_dict = df.groupby('customerid')['eventid'].apply(list).to_dict()
# map dict to dataframe
df['list_of_prev_id'] = df['customerid'].map(prev_events_dict)
答案 1 :(得分:4)
您可以创建如下列表:
df['previouseventid'] = df['customerid'].map(df.groupby('customerid')['eventid'].apply(list))
输出:
customerid eventid date previouseventid
0 233 abc 2019-12-10 [abc, edc, cdl]
1 250 bcd 2019-12-08 [bcd, abl]
2 233 edc 2008-12-10 [abc, edc, cdl]
3 250 abl 2019-12-01 [bcd, abl]
4 233 cdl 2001-12-10 [abc, edc, cdl]
df.groupby('customerid')['eventid']。apply(list)仅列出列表
df.groupby('customerid')['eventid'].apply(list)
customerid
233 [abc, edc, cdl]
250 [bcd, abl]
Name: eventid, dtype: object
答案 2 :(得分:2)
尝试一下:
data.sort_values('date', ascending=True).groupby('customerid', sort=False)['eventid'].agg(list)
输出:
customerid
233 [cdl, edc, abc]
250 [abl, bcd]
Name: eventid, dtype: object