如何根据其他列的唯一组合添加id列?

时间:2019-12-10 00:10:56

标签: python pandas

我正在使用NBA比赛数据,该数据包含每个防守球员和每个进攻球员的球员ID号。我想为每个阵容组合添加一列,因此是deflinid和offlinid。

以下是数据集示例的代码:

df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[11,12,13,14,15,1,2,3,4,5,5,5],[11,12,13,14,15,1,2,3,4,6,10,10],[11,12,13,14,16,2,3,4,5,6,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])

然后,从那里,我想为每个唯一的5人ID组合创建带有阵容ID的列。

这是我想根据上面的示例df生成并添加到df的2列的示例:

offlinid  deflinid
       1         4
       2         4
       3         4
       4         1
       4         2
       5         3

谢谢!

1 个答案:

答案 0 :(得分:1)

使用pd.concatoffplayerX列堆叠在defplayerX列的顶部。接下来,将agg的每一行移至元组,并调用rankunstack

offcols = ['offplayer1', 'offplayer2', 'offplayer3', 'offplayer4', 'offplayer5']
defcols = ['defplayer1', 'defplayer2', 'defplayer3', 'defplayer4', 'defplayer5']

df1 = pd.concat([df[offcols], df[defcols].rename(columns=dict(zip(defcols, offcols)))], 
                 keys=['offlinid',  'deflinid'])

df_final = df1.agg(tuple, axis=1).rank(method='dense').unstack(0)

Out[92]:
   offlinid  deflinid
0       1.0       4.0
1       2.0       4.0
2       3.0       4.0
3       4.0       1.0
4       4.0       2.0
5       5.0       3.0