构建的相同MultiIndex DataFrame不会聚合(平均值)

时间:2016-06-10 06:10:05

标签: python pandas multi-index

简短问题:

在尝试以两种不同的方式对多索引Pandas DataFrame进行分组后,我试图获得一个列(数据系列)的意思。区别仅在于DataFrame的构造。一个给了我想要的结果,另一个给出了错误DataError: No numeric types to aggregate

说明

施工的通用数据

import pandas as pd
import numpy as np
indexTuples = [('a', 1), ('b', 3), ('a', 2), ('c', 2), ('c', 3), ('b', 8)]
multiIndex = pd.MultiIndex.from_tuples(indexTuples, names = ['x', 'y'])

通过方法1

构造DataFrame
columns = ['alpha', 'beta', 'gamma']
df = pd.DataFrame(index=multiIndex, columns=columns)

alpha = pd.Series(index=multiIndex)
beta = pd.Series(index=multiIndex)
gamma = pd.Series(index=multiIndex)

for tup in indexTuples:
    alpha[tup[0], tup[1]] = np.random.randint(400)
    beta[tup[0], tup[1]] = np.random.randint(400)
    gamma[tup[0], tup[1]] = np.random.randint(400)

df.alpha = alpha
df.beta = beta
df.gamma = gamma

df.alpha['a'] = np.nan

df

提供如下所示的数据框

     alpha   beta  gamma
x y                     
a 1    NaN  136.0  224.0
b 3  375.0  227.0  191.0
a 2    NaN  367.0  195.0
c 2  247.0   61.0   78.0
  3  238.0  187.0  366.0
b 8  302.0   14.0  272.0    

如果我执行以下操作,我会得到预期的结果

df.groupby(level='x').alpha.mean()

结果

x
a      NaN
b    148.0
c    244.5
Name: alpha, dtype: float64

通过方法2构建DataFrame

columns = ['alpha', 'beta', 'gamma']
_df = pd.DataFrame(index=multiIndex, columns=columns)

for tup in indexTuples:
    _df.alpha[tup[0], tup[1]] = np.random.randint(400)
    _df.beta[tup[0], tup[1]] = np.random.randint(400)
    _df.gamma[tup[0], tup[1]] = np.random.randint(400)

_df.alpha['a'] = np.nan

提供具有NaN值的类似外观的DataFrame,如上一个方法

所示

但现在当我尝试按级别分组后找到平均值

_df.groupby(level='x').alpha.mean() 

我收到以下错误

---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-192-ad2de6450fab> in <module>()
----> 1 _df.groupby(level='x').alpha.mean()

/film/tools/packages/pandas/0.18.0/CentOS-6.2_thru_7/python-2.7/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in mean(self)
    933         """
    934         try:
--> 935             return self._cython_agg_general('mean')
    936         except GroupByError:
    937             raise

/film/tools/packages/pandas/0.18.0/CentOS-6.2_thru_7/python-2.7/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
    750 
    751         if len(output) == 0:
--> 752             raise DataError('No numeric types to aggregate')
    753 
    754         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

为什么它在第一种情况下工作而不在第二种情况下工作?

1 个答案:

答案 0 :(得分:2)

当您构建_df时,dtype变为object。发生这种情况是因为您定义_df时未使用任何数据启动它并默认为object。在构造#1中,通过赋予series独立构造的值以及因此浮点类型来克服这一点。在构造#2中,您明确地向_df个位置分配了数据。这些地点已被视为object

_df.dtypes

alpha    object
beta     object
gamma    object
dtype: object

使用它来获得结果:

_df.astype(float).groupby(level='x').alpha.mean()