我有一个带有非唯一major_axis的pandas面板,我试图使用groupby对非唯一行进行求和,但是我得到一个错误,指出major_axis不可迭代。我已经搜索了堆栈溢出和消息板,但似乎Panel没有像数据帧那样广泛使用。
以下是产生错误的示例:
import pandas as pd
import datetime as dt
import dateutil.relativedelta as rd
import numpy as np
items = ['A','B']
minor_axis = ['x','y']
diff = rd.relativedelta(years=1)
major_axis = [dt.date(2013,1,1) + (diff * shift) for shift in xrange(4)] * 2
values = np.random.randn(2,8,2)
data = pd.Panel(data=values, major_axis=major_axis, minor_axis=minor_axis, items=items)
data.groupby(sum, axis='major')
这是stacktrace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-29-e30fb9b32fce> in <module>()
----> 1 data.groupby(sum, axis='major')
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/panel.pyc in groupby(self, function, axis)
1084 from pandas.core.groupby import PanelGroupBy
1085 axis = self._get_axis_number(axis)
-> 1086 return PanelGroupBy(self, function, axis=axis)
1087
1088 def swapaxes(self, axis1='major', axis2='minor', copy=True):
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze)
195 if grouper is None:
196 grouper, exclusions = _get_grouper(obj, keys, axis=axis,
--> 197 level=level, sort=sort)
198
199 self.grouper = grouper
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort)
1323 raise AssertionError(errmsg)
1324
-> 1325 ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
1326 groupings.append(ping)
1327
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, index, grouper, name, level, sort)
1197 # no level passed
1198 if not isinstance(self.grouper, np.ndarray):
-> 1199 self.grouper = self.index.map(self.grouper)
1200 if not (hasattr(self.grouper,"__len__") and \
1201 len(self.grouper) == len(self.index)):
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/index.pyc in map(self, mapper)
856
857 def map(self, mapper):
--> 858 return self._arrmap(self.values, mapper)
859
860 def isin(self, values):
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.arrmap_object (pandas/algos.c:62269)()
TypeError: 'datetime.date' object is not iterable
关于如何处理这种情况的任何想法?
非常感谢,
布伦丹
答案 0 :(得分:2)
在0.12中你可以尝试
>>> data.groupby(np.sum, axis='major')
<pandas.core.groupby.PanelGroupBy object at 0x1a2ba50>
答案 1 :(得分:2)
@alko的答案确实是你问题的解决方案,尽管我认为你误解了这个群体。您仍然需要在groupby()
来电中应用功能或汇总,在您的情况下,汇总组data.groupby(..).sum()
中的所有项目。
但我建议您考虑是否需要使用Panel。当然我不知道你的情况,但在很多情况下使用MultiIndex 可以解决问题。
您的面板和groupby将如下所示:
>>> items = ['A', 'A', 'B', 'B']
>>> minor_axis = ['x','y', 'x', 'y']
>>> diff = rd.relativedelta(years=1)
>>> major_axis = [dt.date(2013,1,1) + (diff * shift) for shift in xrange(4)] * 2
>>> values = np.random.randn(8,4)
>>>
>>> data = pd.DataFrame(values, index=major_axis, columns=pd.MultiIndex.from_arrays([items, minor_axis]))
>>> data
A B
x y x y
2013-01-01 -1.063086 0.564123 0.128006 -0.658767
2014-01-01 2.182473 -0.851618 1.180264 0.165581
2015-01-01 -0.003941 0.590801 -1.616197 -2.270557
2016-01-01 -0.736524 0.172791 1.220589 -1.303294
2013-01-01 -1.052184 -1.171545 -0.473488 -0.140327
2014-01-01 0.021189 0.827241 0.775863 -0.882874
2015-01-01 -1.762289 0.705692 0.593365 -0.984109
2016-01-01 -1.946106 -1.108336 -1.691758 -0.088932
>>> data.groupby(data.index).sum()
A B
x y x y
2013-01-01 -2.115270 -0.607422 -0.345482 -0.799094
2014-01-01 2.203662 -0.024377 1.956127 -0.717293
2015-01-01 -1.766230 1.296492 -1.022832 -3.254667
2016-01-01 -2.682630 -0.935544 -0.471170 -1.392226