以下是我所拥有的数据
timediff
2018-06-19 01:00:00
2018-06-19 01:00:01
2018-06-19 01:00:02
2018-06-19 01:00:03
2018-06-19 02:00:00
2018-06-19 02:00:01
2018-06-19 02:00:02
2018-06-19 02:00:03
2018-06-19 02:15:00
2018-06-19 02:15:01
2018-06-19 02:15:02
2018-06-19 02:15:03
2018-06-19 02:30:00
2018-06-19 02:30:01
2018-06-19 02:30:02
2018-06-19 02:30:03
我想为数据中的每个时间组创建一个组标识符。如果数据在4-5秒之内,我想为每个组创建一个标识符。
输出应该像
timediff identifier
2018-06-19 01:00:00 1
2018-06-19 01:00:01 1
2018-06-19 01:00:02 1
2018-06-19 01:00:03 1
2018-06-19 02:00:00 2
2018-06-19 02:00:01 2
2018-06-19 02:00:02 2
2018-06-19 02:00:03 2
2018-06-19 02:15:00 3
2018-06-19 02:15:01 3
2018-06-19 02:15:02 3
2018-06-19 02:15:03 3
2018-06-19 02:30:00 4
2018-06-19 02:30:01 4
2018-06-19 02:30:02 4
2018-06-19 02:30:03 4
由于时间间隔内的每个条目都在4-5秒内,因此我希望将其分组。同样,我想确定所有组。
我是python的新手,不确定如何执行此操作。
有人可以帮我吗?
答案 0 :(得分:1)
我过去每隔五分钟将时间戳分组一次。 如果一组时间相同,则会生成相同的group_key:
group_key = int(时间戳/间隔)*间隔
group_key表示时间在区域间[group_key,group_key + interval)
例如:
interval is 5 seconds
group_key | timestame| time
1529341200 1529341200 '2018-06-19 01:00:00'
1529341200 1529341201 '2018-06-19 01:00:01'
1529341200 1529341202 '2018-06-19 01:00:02'
1529341200 1529341203 '2018-06-19 01:00:03'
1529341200 1529341204 '2018-06-19 01:00:04'
1529341205 1529341205 '2018-06-19 01:00:05'
1529341205 1529341206 '2018-06-19 01:00:06'
1529341205 1529341207 '2018-06-19 01:00:07'
1529341205 1529341208 '2018-06-19 01:00:08'
1529341205 1529341209 '2018-06-19 01:00:09'
1529341210 1529341210 '2018-06-19 01:00:10'
1529341210 1529341211 '2018-06-19 01:00:11'
1529341210 1529341212 '2018-06-19 01:00:12'
1529341210 1529341213 '2018-06-19 01:00:13'
1529341210 1529341214 '2018-06-19 01:00:14'
您的问题:
import time
datetimes=['2018-06-19 01:00:00','2018-06-19 01:00:01','2018-06-19 01:00:02','2018-06-19 01:00:03','2018-06-19 02:00:00','2018-06-19 02:00:01','2018-06-19 02:00:02','2018-06-19 02:00:03','2018-06-19 02:15:00','2018-06-19 02:15:01','2018-06-19 02:15:02','2018-06-19 02:15:03','2018-06-19 02:30:00','2018-06-19 02:30:01','2018-06-19 02:30:02','2018-06-19 02:30:03']
time_interval = 5
group = {}
print "timediff identifier"
for dt in datetimes:
timestamp = int(time.mktime(time.strptime(dt, '%Y-%m-%d %H:%M:%S')))
identifier = int(timestamp/time_interval)*time_interval
print "'"+dt+"'", identifier
但是标识符不是1,2,3,4,它是该组的开始时间戳,我认为它更有意义。如果必须使用1,2,3,4,则需要进行进一步的转换。
输出:
timediff identifier
'2018-06-19 01:00:00' 1529341200
'2018-06-19 01:00:01' 1529341200
'2018-06-19 01:00:02' 1529341200
'2018-06-19 01:00:03' 1529341200
'2018-06-19 02:00:00' 1529344800
'2018-06-19 02:00:01' 1529344800
'2018-06-19 02:00:02' 1529344800
'2018-06-19 02:00:03' 1529344800
'2018-06-19 02:15:00' 1529345700
'2018-06-19 02:15:01' 1529345700
'2018-06-19 02:15:02' 1529345700
'2018-06-19 02:15:03' 1529345700
'2018-06-19 02:30:00' 1529346600
'2018-06-19 02:30:01' 1529346600
'2018-06-19 02:30:02' 1529346600
'2018-06-19 02:30:03' 1529346600
答案 1 :(得分:0)
您正在寻找的帮助在Python的T
模块中-特别是datetime
类。
在Python中提供两个日期时间实例,您只需将它们相减即可得到它们的区别,并且这种区别以datetime.timedelta
实例的形式提供给您:
datetime.timedelta
timedelta的import datetime
# Parse a couple datetimes...
t1 = datetime.strptime('2018-06-19 14:23:14', '%Y-%m-%d %H:%M:%S')
t2 = datetime.strptime('2018-06-19 14:23:16', '%Y-%m-%d %H:%M:%S')
diff = t2 - t1 # Get the timedelta
if diff.seconds < 4:
# t1 and t2 are in the same "group"
属性为您提供两个日期时间之间的秒数(四舍五入为最接近的秒)。
鉴于您可以遍历日期时间字符串列表并将其分组(假设时间戳已按顺序/顺序排列),您便会知道:
.seconds
这将输出:
import datetime
datetimes = ['2018-06-19 14:23:14', '2018-06-19 14:23:16', '2018-06-19 14:23:27', '2018-06-19 14:23:28', '2018-06-19 14:23:29']
# For collecting the groups
grouped_datetimes = []
# Assumes the datetimes are already in order; if not, you can sort them beforehand
min_ts = datetime.datetime.strptime(datetimes[0], '%Y-%m-%d %H:%M:%S')
group = [datetimes[0]]
for dt in datetimes[1:]:
ts = datetime.datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
diff = ts - min_ts
if diff.seconds < 4:
group.append(dt)
else:
grouped_datetimes.append(group)
group = [dt]
min_ts = ts
# Add the last group that was built up
if group:
grouped_datetimes.append(group)
for index, group in enumerate(grouped_datetimes):
for ts in group:
print(f'{ts}\t{index}')
那只是一个快速而肮脏的解决方案。根据您的确切用例,您肯定可以对其进行改进。希望您能了解如何使用timedelta来解决它。