熊猫无法散列的类型:“ numpy.ndarray”

时间:2020-07-31 00:59:09

标签: python pandas

df_ppc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 892 entries, 0 to 891
Data columns (total 4 columns):
Player     892 non-null object
Mean       892 non-null object
Team       892 non-null object
Position   892 non-null object

如果我喜欢groupi:

df = df_ppc.groupby(['Player'])['Mean'].max().sort_values(ascending=False)

有效。

如果我这样分组:

df = df_ppc.groupby(['Player', 'Team'])['Mean'].max().sort_values(ascending=False)

它抛出:

  File "pandas/_libs/hashtable_class_helper.pxi", line 1798, in pandas._libs.hashtable.PyObjectHashTable.factorize
  File "pandas/_libs/hashtable_class_helper.pxi", line 1718, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'numpy.ndarray'

为什么?我该如何解决?

编辑:

SampleTable:

        Player        Mean      Team  \
715  Richard Franco   0.2354   Avaí   
12       Alan Costa   0.6543   CSA   
14      Alan Santos   0.0345   Botafogo   

           Posicao 
715  Meio-Campista       
12        Zagueiro         
14   Meio-Campista  

df_pcc的构建如下:

position = df_players.groupby('Player')['position'].agg(pd.Series.mode)
team = df_players.groupby('Team')['time_nome'].agg(pd.Series.mode)
mean = df_players.groupby('atleta_nome').mean()['points']

df_ppc = pd.DataFrame([team, position, mean]).T

df_ppc.columns = ['Team','Position','Mean']   

df_ppc = df_ppc.reset_index() 

1 个答案:

答案 0 :(得分:1)

构建df_ppc时,仅选择第一个模式即可,因为该函数将返回一个序列而不是单个值

position = df_players.groupby('Player')['position'].agg(lambda x : x.mode().iloc[0])
team = df_players.groupby('Team')['time_nome'].agg(lambda x : x.mode().iloc[0])

例如

pd.Series([1,1,2,2]).mode()
Out[24]: 
0    1
1    2
dtype: int64