我正在处理一个大型数据集,其中为实例提供了时间戳。所有数据都被加载到数据框中。条目的简短片段:
2015-05-12 14:35:49
2015-05-13 09:56:48
2015-05-07 11:01:15
2015-05-13 11:00:04
2015-05-05 13:21:27
我想将数据分组为一小时并计算它们。所以任务的结果应该是这样的:
Time Interval Count
08:00-09:00 2
09:00-10:00 3
10:00-11:00 4
有没有一种有效的方法在Python中执行此操作?
答案 0 :(得分:1)
试
df.groupby(df['date'].map(lambda x: x.hour)
示例:
import pandas as pd
times = [
'2015-05-01 14:05:49',
'2015-05-12 14:35:49',
'2015-05-13 09:56:48',
'2015-05-07 11:01:15',
'2015-05-13 11:00:04',
'2015-05-23 11:30:04',
'2015-05-05 13:21:27',
]
df = pd.DataFrame(pd.to_datetime(times), columns=['date'])
print(df.groupby(df['date'].map(lambda x: x.hour)).describe())
输出:
date
date
9 count 1
unique 1
top 2015-05-13 09:56:48
freq 1
first 2015-05-13 09:56:48
last 2015-05-13 09:56:48
11 count 3
unique 3
top 2015-05-07 11:01:15
freq 1
first 2015-05-07 11:01:15
last 2015-05-23 11:30:04
13 count 1
unique 1
top 2015-05-05 13:21:27
freq 1
first 2015-05-05 13:21:27
last 2015-05-05 13:21:27
14 count 2
unique 2
top 2015-05-01 14:05:49
freq 1
first 2015-05-01 14:05:49
last 2015-05-12 14:35:49
答案 1 :(得分:0)
您可以解析时间戳,只花一小时,并根据字典中的更新条目计算您的时间间隔。见https://eval.in/511344
times = [
'2015-05-12 14:35:49',
'2015-05-13 09:56:48',
'2015-05-07 11:01:15',
'2015-05-13 11:00:04',
'2015-05-05 13:21:27',
]
intervals = {}
for t in times:
hr = t[11:13]
if hr not in intervals:
intervals[hr] = 0
intervals[hr]+=1
print intervals
for k in sorted(intervals.keys()):
print "%s:00-%s:00 %s" % (k,int(k)+1,intervals[k])
打印
# 09:00-10:00 1
# 11:00-12:00 2
# 13:00-14:00 1
# 14:00-15:00 1
虽然根据@MaxU的回答,似乎你正在处理Python / Panda,在我的回答中实际上没有考虑过。方法是相同的:您按小时对集合进行分组。