我尝试计算每个会话的现有数据帧的不同时段的频率:
session time date period
1 05:51:53 2015-05-22 night
1 05:52:59 2015-05-22 night
1 06:08:24 2015-05-22 night
1 06:09:06 2015-05-22 night
1 08:25:31 2015-05-22 morning
2 08:25:35 2015-05-22 morning
2 08:26:37 2015-05-22 morning
2 08:27:11 2015-05-22 morning
2 12:33:17 2015-05-22 noon
3 12:33:45 2015-05-22 noon
为了得到类似的东西:
session time date period frequency
1 05:51:53 2015-05-22 night 4
1 05:52:59 2015-05-22 night
1 06:08:24 2015-05-22 night
1 06:09:06 2015-05-22 night
1 08:25:31 2015-05-22 morning 1
2 08:25:35 2015-05-22 morning 3
2 08:26:37 2015-05-22 morning
2 08:27:11 2015-05-22 morning
2 12:33:17 2015-05-22 noon 1
3 12:33:45 2015-05-22 noon 1
我正在使用这种方法
df['frequency'] = df.groupby('session', as_index=False)['period'].apply(lambda x: x.value_counts())
我有这个错误:TypeError: incompatible index of inserted column with frame index
如果我将.value_counts
直接应用于groupby
df['frequency'] = df.groupby('session', as_index=False)['period'].value_counts()
我有groupby
方法没有属性value_counts
您能告诉我如何计算这些分类值并同时将结果列添加到现有数据框(我相信as_index=False
管理此问题但显然不是)
答案 0 :(得分:0)
您可以在groupby
上'session', 'period'
找到群组的大小
In [19]: df['freq'] = df.groupby(['session', 'period'])['date'].transform(len)
In [20]: df
Out[20]:
session time date period freq
0 1 05:51:53 2015-05-22 night 4
1 1 05:52:59 2015-05-22 night 4
2 1 06:08:24 2015-05-22 night 4
3 1 06:09:06 2015-05-22 night 4
4 1 08:25:31 2015-05-22 morning 1
5 2 08:25:35 2015-05-22 morning 3
6 2 08:26:37 2015-05-22 morning 3
7 2 08:27:11 2015-05-22 morning 3
8 2 12:33:17 2015-05-22 noon 1
9 3 12:33:45 2015-05-22 noon 1