Question

我有一个看起来像

的pandas DataFrame my_data

    event_id    user_id    attended
0     13          345         1
1     14          654         0
...

所以event_id和user_id都有重复项，因为每个用户和事件组合都有一个条目。我想要做的是将其重塑为一个DataFrame，其中我的索引（行）是DISTINCT user_id＆＃39; s，列是DISTINCT event_id＆＃39; s和值给定（row，col）只是它们是否参加的布尔值0或1。

似乎pivot方法是合适的，但当我尝试my_data.pivot(index='user_id', columns='event_id', values='attended')时，我得到了索引重复的错误。

我原本以为我应该先groupby对user_id进行某种attended，但我不想将所有event_id 1＆＃加起来39; s和0＆＃39;为每个用户，因为我特别希望将{{1}}分隔为我的新列，并保持每个用户参加哪个事件。

非常感谢任何帮助，谢谢！

Answer 1

IIUC，pivot_table应该给你你想要的东西：

>>> df = pd.DataFrame({"event_id": np.random.randint(10, 20, 20), "user_id": np.random.randint(100, 110, 20), "attended": np.random.randint(0, 2, 20)})
>>> df.pivot_table(index="user_id", columns="event_id", values="attended", 
    aggfunc=sum).fillna(0)
event_id  10  11  12  13  14  15  16  17  19
user_id                                     
101        0   0   0   1   0   0   0   0   0
103        0   0   0   0   0   0   0   0   0
104        0   0   0   0   0   0   0   0   1
105        0   0   0   0   0   0   0   0   0
106        0   0   0   0   0   0   1   0   0
107        1   0   0   0   0   0   0   1   0
108        0   0   0   1   0   0   0   0   0
109        0   0   0   0   1   0   1   0   0

如上所述，如果有多个行具有相同的用户/事件组合（可能不是这种情况），则将对出勤率求和。如果你想保证帧只包含0和1，那么使用any或剪切值很容易。

Python：当所需的索引系列具有重复项时，转动pandas DataFrame

1 个答案: