我目前有一个字典,该字典存储用户的ID作为键,并将用户执行的事件存储为元组列表。每个元组都包含事件执行的日期和事件本身。
这是字典的摘录:
{
'56d306892fcf7d8a0563b488bbe72b0df1c42f8b62edf18f68a180eab2ca7dc5':
[('2018-10-24T08:30:12.761Z', 'booking_initialized')],
'ac3406118670ef98ee2e3e76ab0f21edccba7b41fa6e4960eea10d2a4d234845':
[('2018-10-20T14:12:35.088Z', 'visited_hotel'), ('2018-10-20T14:17:38.521Z',
'visited_hotel'), ('2018-10-20T14:16:41.968Z', 'visited_hotel'), ('2018-10-
20T13:39:36.064Z', 'search_hotel'), ('2018-10-20T13:47:03.086Z',
'visited_hotel')],
'19813b0b79ec87975e42e02ff34724dd960c7b05efec71477ec66fb04b6bed9c': [('2018-
10-10T18:10:10.242Z', 'referal_code_shared')]
}
我还有一个带有相应列的数据框:
Columns: [visited_hotel, search_hotel, booking_initialized, creferal_code_shared]
我想要做的是遍历每个字典条目,然后适当地将其作为行追加到我的数据框。每行是一个数字,指示用户执行该事件的次数。
例如,在阅读完我的词典摘录后,我的数据框将显示为:
visited_hotel search_hotel booking_initialized referal_code_shared
0 0 0 1 0
1 4 1 0 0
2 0 0 0 1
预先感谢:)
答案 0 :(得分:0)
from collections import Counter
import pandas as pd
# d is your dictionary of values
result = {user: Counter(x[1] for x in records)
for user, records in d.items()}
df = pd.DataFrame(result).fillna(0).T.reset_index(drop=True)
一种更简洁的方法
result = {i: Counter(x[1] for x in records)
for i, records in enumerate(d.values()) }
df = pd.DataFrame(result).fillna(0).T
如果要按特定顺序排列列,则
cols = ['visited_hotel', 'search_hotel', 'booking_initialized', 'referal_code_shared']
df = df.loc[:, cols]
答案 1 :(得分:0)
d = {
'56d306892fcf7d8a0563b488bbe72b0df1c42f8b62edf18f68a180eab2ca7dc5': [('2018-10-24T08:30:12.761Z', 'booking_initialized')],
'ac3406118670ef98ee2e3e76ab0f21edccba7b41fa6e4960eea10d2a4d234845': [('2018-10-20T14:12:35.088Z', 'visited_hotel'), ('2018-10-20T14:17:38.521Z', 'visited_hotel'), ('2018-10-20T14:16:41.968Z', 'visited_hotel'), ('2018-10-20T13:39:36.064Z', 'search_hotel'), ('2018-10-20T13:47:03.086Z', 'visited_hotel')],
'19813b0b79ec87975e42e02ff34724dd960c7b05efec71477ec66fb04b6bed9c': [('2018-10-10T18:10:10.242Z', 'referal_code_shared')]
}
def user_actions(user, actions):
# Convert the actions to dataframe
df = pd.DataFrame(actions).rename(columns={0: 'timestamp', 1: 'action'})
# Count each action
counted = df.groupby(['action'])['timestamp'].agg('count').reset_index().rename(columns={'timestamp': 'counter'})
# Pivot the result so each action is a column
pivoted = counted.pivot(columns='action', values='counter')
return pivoted
# Process each user's actions and concatenate all
all_actions_df = pd.concat([user_actions(user, user_actions_list) for user, user_actions_list in d.items()]).replace(np.nan, 0)
输出
booking_initialized referal_code_shared search_hotel visited_hotel
0 1.0 0.0 0.0 0.0
0 0.0 0.0 1.0 0.0
1 0.0 0.0 0.0 4.0
0 0.0 1.0 0.0 0.0