df_ppc.info()
:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 892 entries, 0 to 891
Data columns (total 4 columns):
Player 892 non-null object
Mean 892 non-null object
Team 892 non-null object
Position 892 non-null object
如果我喜欢groupi:
df = df_ppc.groupby(['Player'])['Mean'].max().sort_values(ascending=False)
有效。
如果我这样分组:
df = df_ppc.groupby(['Player', 'Team'])['Mean'].max().sort_values(ascending=False)
它抛出:
File "pandas/_libs/hashtable_class_helper.pxi", line 1798, in pandas._libs.hashtable.PyObjectHashTable.factorize
File "pandas/_libs/hashtable_class_helper.pxi", line 1718, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'numpy.ndarray'
为什么?我该如何解决?
编辑:
SampleTable:
Player Mean Team \
715 Richard Franco 0.2354 Avaí
12 Alan Costa 0.6543 CSA
14 Alan Santos 0.0345 Botafogo
Posicao
715 Meio-Campista
12 Zagueiro
14 Meio-Campista
df_pcc的构建如下:
position = df_players.groupby('Player')['position'].agg(pd.Series.mode)
team = df_players.groupby('Team')['time_nome'].agg(pd.Series.mode)
mean = df_players.groupby('atleta_nome').mean()['points']
df_ppc = pd.DataFrame([team, position, mean]).T
df_ppc.columns = ['Team','Position','Mean']
df_ppc = df_ppc.reset_index()
答案 0 :(得分:1)
构建df_ppc
时,仅选择第一个模式即可,因为该函数将返回一个序列而不是单个值
position = df_players.groupby('Player')['position'].agg(lambda x : x.mode().iloc[0])
team = df_players.groupby('Team')['time_nome'].agg(lambda x : x.mode().iloc[0])
例如
pd.Series([1,1,2,2]).mode()
Out[24]:
0 1
1 2
dtype: int64