您好我遇到了这个问题,我的数据源事件如下所示:
event_id device_id timestamp longitude latitude
0 1 29182687948017175 2016-05-01 00:55:25 121.38 31.24
1 2 -6401643145415154744 2016-05-01 00:54:12 103.65 30.97
2 3 -4833982096941402721 2016-05-01 00:08:05 106.60 29.7
我试图通过device_id对事件进行分组,然后使用该device_id获取每个事件的变量sum / mean / std:
events['latitude_mean'] = events.groupby(['device_id'])['latitude'].aggregate(np.sum)
但我的输出总是:
event_id device_id timestamp longitude latitude
0 1 29182687948017175 2016-05-01 00:55:25 121.38 31.24
1 2 -6401643145415154744 2016-05-01 00:54:12 103.65 30.97
2 3 -4833982096941402721 2016-05-01 00:08:05 106.60 29.70
3 4 -6815121365017318426 2016-05-01 00:06:40 104.27 23.28
4 5 -5373797595892518570 2016-05-01 00:07:18 115.88 28.66
latitude_mean
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
为了让每一行的返回值保持为NaN,我做错了什么?
答案 0 :(得分:4)
您可以使用pandas.core.groupby.GroupBy.transform(aggfunc)方法,该方法将aggfunc
应用于每个组中的所有行:
In [32]: events['latitude_mean'] = events.groupby(['device_id'])['latitude'].transform('sum')
In [33]: events
Out[33]:
event_id device_id timestamp longitude latitude latitude_mean
0 1 29182687948017175 2016-05-01 00:55:25 121.38 31.24 62.55
1 2 29182687948017175 2016-05-30 12:12:12 777.77 31.31 62.55
2 3 -6401643145415154744 2016-05-01 00:54:12 103.65 30.97 64.30
3 4 -6401643145415154744 2016-01-01 11:11:11 111.11 33.33 64.30
Here you may find some usage examples
说明:当您对DF进行分组时 - 结果您通常会有一个包含较少行且索引不同的系列,因此pandas在将其分配给新的时不知道如何对齐它列,因此你有NaN's:
In [31]: events.groupby(['device_id'])['latitude'].agg(np.sum)
Out[31]:
device_id
-6401643145415154744 64.30
29182687948017175 62.55
Name: latitude, dtype: float64
因此,当您尝试将其分配给新列时,pandas会执行以下操作:
In [36]: events['nans'] = pd.Series([1,2], index=['a','b'])
In [38]: events[['event_id','nans']]
Out[38]:
event_id nans
0 1 NaN
1 2 NaN
2 3 NaN
3 4 NaN
数据:
In [30]: events
Out[30]:
event_id device_id timestamp longitude latitude
0 1 29182687948017175 2016-05-01 00:55:25 121.38 31.24
1 2 29182687948017175 2016-05-30 12:12:12 777.77 31.31
2 3 -6401643145415154744 2016-05-01 00:54:12 103.65 30.97
3 4 -6401643145415154744 2016-01-01 11:11:11 111.11 33.33