我有一个按比赛情况分类的NBA球员数据集,我想知道是否有一种方法可以获取通常用于连续值的 mode 统计等效项,但要获取发生频率最高的字符串值?
t1_start1 t1_start2 t1_start3 t1_start4 t1_start5 team1
0 Shaquille O'Neal Kobe Bryant Horace Grant Ron Harper Rick Fox LAL
1 Shaquille O'Neal Kobe Bryant Horace Grant Ron Harper Rick Fox LAL
2 Kobe Bryant Shaquille O'Neal Horace Grant Ron Harper Brian Shaw LAL
3 Kobe Bryant Shaquille O'Neal Horace Grant Brian Shaw Ron Harper LAL
4 Kobe Bryant Shaquille O'Neal Horace Grant Ron Harper Brian Shaw LAL
5 LeBron James Brandon Ingram Kyle Kuzma JaVale McGeeLonzo Ball LAL
不管玩家入门者的订单(t1_start1 | t1_start2 | t1_start3 | ... )
如何,我如何在过去3行中按“ team1”列分组来排名前5位的最常用玩家?
答案 0 :(得分:1)
您可以将np.unique()
与return_counts=True
和np.argsort()
结合使用:
players, starts = np.unique(df[['t1_start1','t1_start2','t1_start3','t1_start4','t1_start5']].values, return_counts=True)
players[np.argsort(-starts)][:5]
返回:
['Horace Grant' 'Kobe Bryant' 'Ron Harper' "Shaquille O'Neal" 'Brian Shaw']
答案 1 :(得分:0)
flat_list = df.loc[[0:3]].values.flatten() # first 3 rows flattened to a 1d list
print(scipy.stats.mode(flat_list).mode) # the most common element in that list
如果您想要多个值,可以使用collections.Counter
most_common_5 = collection.Counter(flat_list).most_common(5)