我正在使用NBA比赛数据,该数据包含每个防守球员和每个进攻球员的球员ID号。我想为每个阵容组合添加一列,因此是deflinid和offlinid。
以下是数据集示例的代码:
df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[11,12,13,14,15,1,2,3,4,5,5,5],[11,12,13,14,15,1,2,3,4,6,10,10],[11,12,13,14,16,2,3,4,5,6,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])
然后,从那里,我想为每个唯一的5人ID组合创建带有阵容ID的列。
这是我想根据上面的示例df生成并添加到df的2列的示例:
offlinid deflinid
1 4
2 4
3 4
4 1
4 2
5 3
谢谢!
答案 0 :(得分:1)
使用pd.concat
将offplayerX
列堆叠在defplayerX
列的顶部。接下来,将agg
的每一行移至元组,并调用rank
和unstack
offcols = ['offplayer1', 'offplayer2', 'offplayer3', 'offplayer4', 'offplayer5']
defcols = ['defplayer1', 'defplayer2', 'defplayer3', 'defplayer4', 'defplayer5']
df1 = pd.concat([df[offcols], df[defcols].rename(columns=dict(zip(defcols, offcols)))],
keys=['offlinid', 'deflinid'])
df_final = df1.agg(tuple, axis=1).rank(method='dense').unstack(0)
Out[92]:
offlinid deflinid
0 1.0 4.0
1 2.0 4.0
2 3.0 4.0
3 4.0 1.0
4 4.0 2.0
5 5.0 3.0