在尝试以两种不同的方式对多索引Pandas DataFrame进行分组后,我试图获得一个列(数据系列)的意思。区别仅在于DataFrame的构造。一个给了我想要的结果,另一个给出了错误DataError: No numeric types to aggregate
施工的通用数据
import pandas as pd
import numpy as np
indexTuples = [('a', 1), ('b', 3), ('a', 2), ('c', 2), ('c', 3), ('b', 8)]
multiIndex = pd.MultiIndex.from_tuples(indexTuples, names = ['x', 'y'])
columns = ['alpha', 'beta', 'gamma']
df = pd.DataFrame(index=multiIndex, columns=columns)
alpha = pd.Series(index=multiIndex)
beta = pd.Series(index=multiIndex)
gamma = pd.Series(index=multiIndex)
for tup in indexTuples:
alpha[tup[0], tup[1]] = np.random.randint(400)
beta[tup[0], tup[1]] = np.random.randint(400)
gamma[tup[0], tup[1]] = np.random.randint(400)
df.alpha = alpha
df.beta = beta
df.gamma = gamma
df.alpha['a'] = np.nan
df
提供如下所示的数据框
alpha beta gamma
x y
a 1 NaN 136.0 224.0
b 3 375.0 227.0 191.0
a 2 NaN 367.0 195.0
c 2 247.0 61.0 78.0
3 238.0 187.0 366.0
b 8 302.0 14.0 272.0
如果我执行以下操作,我会得到预期的结果
df.groupby(level='x').alpha.mean()
结果
x
a NaN
b 148.0
c 244.5
Name: alpha, dtype: float64
columns = ['alpha', 'beta', 'gamma']
_df = pd.DataFrame(index=multiIndex, columns=columns)
for tup in indexTuples:
_df.alpha[tup[0], tup[1]] = np.random.randint(400)
_df.beta[tup[0], tup[1]] = np.random.randint(400)
_df.gamma[tup[0], tup[1]] = np.random.randint(400)
_df.alpha['a'] = np.nan
提供具有NaN
值的类似外观的DataFrame,如上一个方法
但现在当我尝试按级别分组后找到平均值
_df.groupby(level='x').alpha.mean()
我收到以下错误
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-192-ad2de6450fab> in <module>()
----> 1 _df.groupby(level='x').alpha.mean()
/film/tools/packages/pandas/0.18.0/CentOS-6.2_thru_7/python-2.7/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in mean(self)
933 """
934 try:
--> 935 return self._cython_agg_general('mean')
936 except GroupByError:
937 raise
/film/tools/packages/pandas/0.18.0/CentOS-6.2_thru_7/python-2.7/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
750
751 if len(output) == 0:
--> 752 raise DataError('No numeric types to aggregate')
753
754 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
为什么它在第一种情况下工作而不在第二种情况下工作?
答案 0 :(得分:2)
当您构建_df
时,dtype
变为object
。发生这种情况是因为您定义_df
时未使用任何数据启动它并默认为object
。在构造#1中,通过赋予series
独立构造的值以及因此浮点类型来克服这一点。在构造#2中,您明确地向_df
个位置分配了数据。这些地点已被视为object
。
_df.dtypes
alpha object
beta object
gamma object
dtype: object
使用它来获得结果:
_df.astype(float).groupby(level='x').alpha.mean()