我遇到了一些问题,我很确定曾经运行过(在较旧的pandas版本上)。在0.9上,我得到没有数字类型来聚合错误。有什么想法吗?
In [31]: data
Out[31]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)
In [32]: latedges = linspace(-90., 90., 73)
In [33]: lats_new = linspace(-87.5, 87.5, 72)
In [34]: def _get_gridbox_label(x, bins, labels):
....: return labels[searchsorted(bins, x) - 1]
....:
In [35]: lat_bucket = lambda x: _get_gridbox_label(x, latedges, lats_new)
In [36]: data.T.groupby(lat_bucket).mean()
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-36-ed9c538ac526> in <module>()
----> 1 data.T.groupby(lat_bucket).mean()
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in mean(self)
295 """
296 try:
--> 297 return self._cython_agg_general('mean')
298 except DataError:
299 raise
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
1415
1416 def _cython_agg_general(self, how, numeric_only=True):
-> 1417 new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
1418 return self._wrap_agged_blocks(new_blocks)
1419
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_blocks(self, how, numeric_only)
1455
1456 if len(new_blocks) == 0:
-> 1457 raise DataError('No numeric types to aggregate')
1458
1459 return new_blocks
DataError: No numeric types to aggregate
答案 0 :(得分:38)
您如何生成数据?
查看输出结果如何显示您的数据是&#39; object&#39;类型? groupby操作专门检查每列是否都是数字dtype。
In [31]: data
Out[31]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)
看↑
您是否先初始化一个空的DataFrame然后填充它?如果是这样,可能为什么它像以前一样用新版本改变了0.9空数据框架被初始化为浮点类型,但现在它们是对象类型。如果是这样,您可以将初始化更改为DataFrame(dtype=float)
。
您也可以致电frame.astype(float)
答案 1 :(得分:7)
我在生成包含时间戳和数据的数据框时遇到此错误:
df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp))
添加建议的解决方案对我有用:
df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp), dtype=float))
谢谢张她!
示例:
data
2005-01-01 00:10:00 7.53
2005-01-01 00:20:00 7.54
2005-01-01 00:30:00 7.62
2005-01-01 00:40:00 7.68
2005-01-01 00:50:00 7.81
2005-01-01 01:00:00 7.95
2005-01-01 01:10:00 7.96
2005-01-01 01:20:00 7.95
2005-01-01 01:30:00 7.98
2005-01-01 01:40:00 8.06
2005-01-01 01:50:00 8.04
2005-01-01 02:00:00 8.06
2005-01-01 02:10:00 8.12
2005-01-01 02:20:00 8.12
2005-01-01 02:30:00 8.25
2005-01-01 02:40:00 8.27
2005-01-01 02:50:00 8.17
2005-01-01 03:00:00 8.21
2005-01-01 03:10:00 8.29
2005-01-01 03:20:00 8.31
2005-01-01 03:30:00 8.25
2005-01-01 03:40:00 8.19
2005-01-01 03:50:00 8.17
2005-01-01 04:00:00 8.18
data
2005-01-01 00:00:00 7.636000
2005-01-01 01:00:00 7.990000
2005-01-01 02:00:00 8.165000
2005-01-01 03:00:00 8.236667
2005-01-01 04:00:00 8.180000
答案 2 :(得分:3)
我做到了:
data_frame.groupby(COL1).COL2.apply(np.mean).reset_index()
答案 3 :(得分:0)
在这里遇到了同样的问题,搜索了这么长时间才意识到我的值不是浮点数而是字符串。
这是解决我的问题的方法:
df["column_name"] = pd.to_numeric(df["column_name"], downcast="float")