我有一个大的Pandas数据集(4600万行),在这里用一个小样本表示:
df = pd.DataFrame([[0, 0, 0, 34],[0, 0, 1, 23],[0, 1, 0, 14],[0, 1, 1, 11],[1, 0, 0, 73],[1, 0, 1, 33],[1, 1, 0, 96],[1, 1, 1, 64],[2, 0, 0, 4],[2, 0, 1, 13],[2, 1, 0, 31],[2, 1, 1, 10]])
df.columns = ['month','player','team','skill']
每个月我们都有一个由球员和球队组成的产品直角坐标系
id month player team skill
0 0 0 0 34
1 0 0 1 23
2 0 1 0 14
3 0 1 1 11
4 1 0 0 73
5 1 0 1 33
6 1 1 0 96
7 1 1 1 64
8 2 0 0 4
9 2 0 1 13
10 2 1 0 31
11 2 1 1 10
我想按月转移技能专栏的字词, 为了得到这样的东西
0 0 0 0 73
1 0 0 1 33
2 0 1 0 96
3 0 1 1 64
4 1 0 0 4
5 1 0 1 13
6 1 1 0 31
7 1 1 1 10
8 2 0 0 Nan
9 2 0 1 Nan
10 2 1 0 Nan
11 2 1 1 Nan
如何在Pandas中有效地做到这一点? 谢谢!
答案 0 :(得分:0)
如果我对您的理解正确,那么您想在下个月为同一skill
组合找到player-team
。您可以使用groupby
和transform
来做到这一点:
# Sort the rows by `player-team-month` combination so that the
# next row is the subsequent month for the same `player-team`
# or a new `player-team`
tmp = df.sort_values(['player', 'team', 'month'])
# The groupby here serves to divide the dataframe by `player-team`
# Each group is now ordered by `month` so `skill.shift(-1)` can
# give us the `skill` in the following month
skill = tmp.groupby(['player', 'team'])['skill'].transform(lambda s: s.shift(-1))
# Combine the shifted skill with the original attributes
result = pd.concat([tmp[['month', 'player', 'team']], skill], axis=1)