Question

我正在尝试计算列的中位数。

我找到了一个非常明确的例子

Pandas: Calculate Median of Group over Columns

这个问题和答案正是我需要的答案。我创建了确切的示例，通过我自己的详细信息来处理

import pandas
import numpy

data_3 = [2,3,4,5,4,2]
data_4 = [0,1,2,3,4,2]

df = pandas.DataFrame({'COL1': ['A','A','A','A','B','B'], 
                       'COL2': ['AA','AA','BB','BB','BB','BB'],
                       'COL3': data_3,
                       'COL4': data_4})

m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(numpy.median)

当我尝试计算Group over columns的中位数时，我遇到了错误

TypeError: Series.name must be a hashable type

如果我使用完全相同的代码，唯一的区别是用不同的统计量替换中位数（mean，min，max，std），一切正常。

我不明白这个错误的原因以及为什么它只出现在中位数，这是我真正需要计算的。

先谢谢你的帮助，

鲍勃

以下是完整的错误消息。我使用的是python 3.5.2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-af0ef7da3347> in <module>()
----> 1 m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(numpy.median)

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    649         # ignore SettingWithCopy here in case the user mutates
    650         with option_context('mode.chained_assignment', None):
--> 651             return self._python_apply_general(f)
    652 
    653     def _python_apply_general(self, f):

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
    658             keys,
    659             values,
--> 660             not_indexed_same=mutated or self.mutated)
    661 
    662     def _iterate_slices(self):

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   3373                 coerce = True if any([isinstance(x, Timestamp)
   3374                                       for x in values]) else False
-> 3375                 return (Series(values, index=key_index, name=self.name)
   3376                         ._convert(datetime=True,
   3377                                   coerce=coerce))

    /Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
        231         generic.NDFrame.__init__(self, data, fastpath=True)
        232 
    --> 233         self.name = name
        234         self._set_axis(0, index, fastpath=True)
        235 

    /Applications/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)

   2692             object.__setattr__(self, name, value)
   2693         elif name in self._metadata:
-> 2694             object.__setattr__(self, name, value)
   2695         else:
   2696             try:

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in name(self, value)
    307     def name(self, value):
    308         if value is not None and not com.is_hashable(value):
--> 309             raise TypeError('Series.name must be a hashable type')
    310         object.__setattr__(self, '_name', value)
    311 

TypeError: Series.name must be a hashable type

Answer 1

不知何故，这个阶段的系列名称被解释为不可清除，尽管据称是一个元组。我认为它可能与固定和关闭的错误相同：

的 Apply on selected columns of a groupby object - stopped working with 0.18.1 #13568

基本上，组中的单个标量值（如您的示例中所示）导致系列的名称无法传递。它在0.19.2中修复。

在任何情况下，都不应该是一个实际问题，因为您可以（并且应该）直接在GroupBy对象上调用mean，median等。< / p>
>>> df.groupby(['COL1', 'COL2'])[['COL3', 'COL4']].median() COL3 COL4 COL1 COL2 A AA 2.5 0.5 BB 4.5 2.5 B BB 3.0 3.0

Pandas计算列上的组中位数

1 个答案: