您好我有这样的dataFrame:
Value day hour min
Time
2015-12-19 10:08:52 1805 2015-12-19 10 8
2015-12-19 10:09:52 1794 2015-12-19 10 9
2015-12-19 10:19:51 1796 2015-12-19 10 19
2015-12-19 10:20:51 1806 2015-12-19 10 20
2015-12-19 10:29:52 1802 2015-12-19 10 29
2015-12-19 10:30:52 1800 2015-12-19 10 30
2015-12-19 10:40:51 1804 2015-12-19 10 40
2015-12-19 10:41:51 1798 2015-12-19 10 41
2015-12-19 10:50:51 1790 2015-12-19 10 50
2015-12-19 10:51:52 1811 2015-12-19 10 51
2015-12-19 11:00:51 1803 2015-12-19 11 0
2015-12-19 11:01:52 1784 2015-12-19 11 1
... ... ... ... ...
2016-07-15 17:30:13 1811 2016-07-15 17 30
2016-07-15 17:31:13 1787 2016-07-15 17 31
2016-07-15 17:41:13 1800 2016-07-15 17 41
2016-07-15 17:42:13 1795 2016-07-15 17 42

我希望按天和小时对其进行分组,最后将其作为" Value"的多维数组。像这样的列:
基于日和小时的分组,我需要每小时得到这样的东西:
2015-12-19 10 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179... ]
2015-12-20 11 [1803, 1793, 1795, 1801, 1796, 1796, 1788, 180... ]
...
2016-07-15 17 [1794, 1792, 1788, 1799, 1811, 1803, 1808, 179... ]

最后,我希望我能拥有这样的数据框:
Time_index hour value1 value2 value3 ........value20
2015-12-19 10 1805, 1794, 1796, 1806 ... 1804, 1791, 1788, 1812
2015-12-20 11 1803, 1793, 1795, 1801 ... 1796, 1796, 1788, 1800
...
2016-07-15 17 1794, 1792, 1788, 1799 ... 1811, 1803, 1808, 1790

或者像这样的数组:
[[1805, 1794, 1796, 1806, 1802, 1800, 1804, 179... ],[1803, 1793, 1795, 1801, 1796, 1796, 1788, 180... ]....[1794, 1792, 1788, 1799, 1811, 1803, 1808, 179... ]]

我能够通过一个列工作得到groupby:
grouped_0 = train_df.groupby(['day'])
grouped = grouped_0.aggregate(lambda x: list(x))
grouped['grouped'] = grouped['Value']

dataFrame的输出分组' s'分组'列就像:
2015-12-19 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179...
2015-12-20 [1790, 1809, 1809, 1789, 1807, 1804, 1790, 179...
2015-12-21 [1794, 1792, 1788, 1799, 1811, 1803, 1808, 179...
2015-12-22 [1815, 1812, 1798, 1808, 1802, 1788, 1808, 179...
2015-12-23 [1803, 1800, 1799, 1803, 1802, 1804, 1788, 179...
2015-12-24 [1803, 1795, 1801, 1798, 1799, 1802, 1799, 179...

然而,当我尝试这个时:
grouped_0 = train_df.groupby(['day', 'hour'])
grouped = grouped_0.aggregate(lambda x: list(x))
grouped['grouped'] = grouped['Value']

它抛出了这个错误:
Traceback (most recent call last):
File "<input>", line 3, in <module>
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 4036, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 3476, in aggregate
return self._python_agg_general(arg, *args, **kwargs)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 848, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 2180, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 2215, in _aggregate_series_pure_python
raise ValueError('Function does not reduce')
ValueError: Function does not reduce
&#13;
我的熊猫版: 概率pd。的版本 &#39; 0.20.3&#39;
答案 0 :(得分:1)
是的,使用agg
这不是最好的主意,因为除非结果是具有单个对象的容器,否则结果将被视为无效。
您可以使用groupby
+ apply
。
g = df.groupby(['day', 'hour']).Value.apply(lambda x: x.values.tolist())
g
day hour
2015-12-19 10 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179...
11 [1803, 1784]
2016-07-15 17 [1811, 1787, 1800, 1795]
Name: Value, dtype: object
如果您希望每个元素都在自己的列中,您可以这样做:
v = pd.DataFrame(g.values.tolist(), index=g.index)\
.rename(columns=lambda x: 'value{}'.format(x + 1)).reset_index()
v
是您的最终结果。