我正在尝试细分每分钟列出“ f”列中的条目的次数。
import pandas as pd import datetime as dt f= ['f0001', 'f0001', 'f0001', 'f0001', 'f0020', 'f0008', 'f0001', 'f0005', 'f3203', 'f0002', 'f0002', 'f0001', 'f0201', 'f0001', 'f0439', 'f0233', 'f0008', 'f0003', 'f0009', 'f0005'] dates = ['20130101100103', '20130101100110', '20130101100125', '20130101100133', '20130101100100', '20130101100200', '20130101100200', '20130101100200', '20130101100200', '20130101100200', '20130101100200', '20130101100300', '20130101100300', '20130101100300', '20130101100300', '20130101100400', '20130101100400', '20130101100400', '20130101100400', '20130101100400'] d = {'date': dates} data = pd.DataFrame(d) data['user'] = f data.date = data.date.apply(str) data.date = data.date.apply(lambda x: dt.datetime.strptime(x,'%Y%m%d%H%M%S')) s = data.groupby([data.date.map(lambda t: t.minute)]).count()
但是到目前为止,我所拥有的只是下面的
s date user date 1 5 5 2 6 6 3 4 4 4 5 5
答案 0 :(得分:5)
您快到了。您只需要将data['user']
添加到您的groupby子句中。
data.groupby([[data.date.dt.minute, data['user']]).count().rename(columns={'date':'count'}).reset_index()
输出:
date user count
0 1 f0001 4
1 1 f0020 1
2 2 f0001 1
3 2 f0002 2
4 2 f0005 1
5 2 f0008 1
6 2 f3203 1
7 3 f0001 2
8 3 f0201 1
9 3 f0439 1
10 4 f0003 1
11 4 f0005 1
12 4 f0008 1
13 4 f0009 1
14 4 f0233 1
答案 1 :(得分:2)
您可以使用pd.Grouper
:
data.groupby(pd.Grouper(key='date', freq='min'))['user'].value_counts()
完整示例,包括如何使用pandas
函数以更简单的方式创建数据框(我知道这不是您要讨论的话题,但是我认为以下操作比导入datetime
更加容易:
df = pd.DataFrame({'date':dates, 'user':f})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d%H%M%S')
df.groupby(pd.Grouper(key='date', freq='min'))['user'].value_counts()
结果:
date user
2013-01-01 10:01:00 f0001 4
f0020 1
2013-01-01 10:02:00 f0002 2
f0001 1
f0005 1
f0008 1
f3203 1
2013-01-01 10:03:00 f0001 2
f0201 1
f0439 1
2013-01-01 10:04:00 f0003 1
f0005 1
f0008 1
f0009 1
f0233 1
Name: user, dtype: int64