Question

假设我有以下事件数据：

ts,uid
2016-02-13 20:18:03.000001 UTC,5236965070
2016-02-13 23:05:08 UTC,2834437228
2016-02-13 23:13:00.000032 UTC,2206245130
2016-02-13 22:45:07.000004 UTC,1539535012
2016-02-13 23:47:44 UTC,3431025028
2016-02-13 16:42:16.000001 UTC,810825324
2016-02-13 22:37:14 UTC,2625355144
2016-02-14 00:31:52.000009 UTC,24815453
2016-02-12 06:43:40.000007 UTC,3895095040
2016-02-14 00:09:04 UTC,715095136
...

如何在一小时内为事件计数创建uid的数据透视表？我试着做了

DF.groupby（['uid'，pandas.TimeGrouper（key ='ts'，freq ='h'）]，sort = False）.count（）

但我得到了ValueError: items in new_categories are not the same as in old categories。我怎样才能让它发挥作用？使用pivot或pivot_table是一种更好的方法吗？

Answer 1

最好使用dt.hour从时间戳访问小时（如果您的列已经不是日期时间，请使用pd.to_datetime进行转换）：

In [90]: df.groupby([df.uid,df.ts.dt.hour]).count()
Out[90]:
               ts
uid        ts
24815453   0    1
715095136  0    1
810825324  16   1
1539535012 22   1
2206245130 23   1
2625355144 22   1
2834437228 23   1
3431025028 23   1
3895095040 6    1
5236965070 20   1

请注意，groupby＆＃34;消费＆＃34; uid列;如果您想避免这种情况，可以使用as_index = False。

如何从重新采样的列创建数据透视表？

1 个答案: