创建时间戳的组标识符

时间:2018-06-19 22:39:40

标签: python python-2.7

以下是我所拥有的数据

  timediff
2018-06-19 01:00:00
2018-06-19 01:00:01
2018-06-19 01:00:02
2018-06-19 01:00:03
2018-06-19 02:00:00
2018-06-19 02:00:01
2018-06-19 02:00:02
2018-06-19 02:00:03
2018-06-19 02:15:00
2018-06-19 02:15:01
2018-06-19 02:15:02
2018-06-19 02:15:03
2018-06-19 02:30:00
2018-06-19 02:30:01
2018-06-19 02:30:02
2018-06-19 02:30:03

我想为数据中的每个时间组创建一个组标识符。如果数据在4-5秒之内,我想为每个组创建一个标识符。

输出应该像

timediff                identifier
2018-06-19 01:00:00          1
2018-06-19 01:00:01          1
2018-06-19 01:00:02          1
2018-06-19 01:00:03          1
2018-06-19 02:00:00          2
2018-06-19 02:00:01          2
2018-06-19 02:00:02          2
2018-06-19 02:00:03          2
2018-06-19 02:15:00          3
2018-06-19 02:15:01          3
2018-06-19 02:15:02          3
2018-06-19 02:15:03          3
2018-06-19 02:30:00          4
2018-06-19 02:30:01          4
2018-06-19 02:30:02          4
2018-06-19 02:30:03          4

由于时间间隔内的每个条目都在4-5秒内,因此我希望将其分组。同样,我想确定所有组。

我是python的新手,不确定如何执行此操作。

有人可以帮我吗?

2 个答案:

答案 0 :(得分:1)

我过去每隔五分钟将时间戳分组一次。 如果一组时间相同,则会生成相同的group_key:

  

group_key = int(时间戳/间隔)*间隔

group_key表示时间在区域间[group_key,group_key + interval)

例如:

interval is 5 seconds
group_key | timestame| time
1529341200 1529341200 '2018-06-19 01:00:00'
1529341200 1529341201 '2018-06-19 01:00:01'
1529341200 1529341202 '2018-06-19 01:00:02'
1529341200 1529341203 '2018-06-19 01:00:03'
1529341200 1529341204 '2018-06-19 01:00:04'

1529341205 1529341205 '2018-06-19 01:00:05'
1529341205 1529341206 '2018-06-19 01:00:06'
1529341205 1529341207 '2018-06-19 01:00:07'
1529341205 1529341208 '2018-06-19 01:00:08'
1529341205 1529341209 '2018-06-19 01:00:09'

1529341210 1529341210 '2018-06-19 01:00:10'
1529341210 1529341211 '2018-06-19 01:00:11'
1529341210 1529341212 '2018-06-19 01:00:12'
1529341210 1529341213 '2018-06-19 01:00:13'
1529341210 1529341214 '2018-06-19 01:00:14'

您的问题:

import time
datetimes=['2018-06-19 01:00:00','2018-06-19 01:00:01','2018-06-19 01:00:02','2018-06-19 01:00:03','2018-06-19 02:00:00','2018-06-19 02:00:01','2018-06-19 02:00:02','2018-06-19 02:00:03','2018-06-19 02:15:00','2018-06-19 02:15:01','2018-06-19 02:15:02','2018-06-19 02:15:03','2018-06-19 02:30:00','2018-06-19 02:30:01','2018-06-19 02:30:02','2018-06-19 02:30:03']

time_interval = 5
group = {}

print "timediff               identifier"
for dt in datetimes:
    timestamp = int(time.mktime(time.strptime(dt, '%Y-%m-%d %H:%M:%S')))
    identifier = int(timestamp/time_interval)*time_interval
    print "'"+dt+"'", identifier

但是标识符不是1,2,3,4,它是该组的开始时间戳,我认为它更有意义。如果必须使用1,2,3,4,则需要进行进一步的转换。

输出:

timediff               identifier
'2018-06-19 01:00:00' 1529341200
'2018-06-19 01:00:01' 1529341200
'2018-06-19 01:00:02' 1529341200
'2018-06-19 01:00:03' 1529341200
'2018-06-19 02:00:00' 1529344800
'2018-06-19 02:00:01' 1529344800
'2018-06-19 02:00:02' 1529344800
'2018-06-19 02:00:03' 1529344800
'2018-06-19 02:15:00' 1529345700
'2018-06-19 02:15:01' 1529345700
'2018-06-19 02:15:02' 1529345700
'2018-06-19 02:15:03' 1529345700
'2018-06-19 02:30:00' 1529346600
'2018-06-19 02:30:01' 1529346600
'2018-06-19 02:30:02' 1529346600
'2018-06-19 02:30:03' 1529346600

答案 1 :(得分:0)

您正在寻找的帮助在Python的T模块中-特别是datetime类。

在Python中提供两个日期时间实例,您只需将它们相减即可得到它们的区别,并且这种区别以datetime.timedelta实例的形式提供给您:

datetime.timedelta

timedelta的import datetime # Parse a couple datetimes... t1 = datetime.strptime('2018-06-19 14:23:14', '%Y-%m-%d %H:%M:%S') t2 = datetime.strptime('2018-06-19 14:23:16', '%Y-%m-%d %H:%M:%S') diff = t2 - t1 # Get the timedelta if diff.seconds < 4: # t1 and t2 are in the same "group" 属性为您提供两个日期时间之间的秒数(四舍五入为最接近的秒)。

鉴于您可以遍历日期时间字符串列表并将其分组(假设时间戳已按顺序/顺序排列),您便会知道:

.seconds

这将输出:

import datetime

datetimes = ['2018-06-19 14:23:14', '2018-06-19 14:23:16', '2018-06-19 14:23:27', '2018-06-19 14:23:28', '2018-06-19 14:23:29']

# For collecting the groups
grouped_datetimes = []

# Assumes the datetimes are already in order; if not, you can sort them beforehand
min_ts = datetime.datetime.strptime(datetimes[0], '%Y-%m-%d %H:%M:%S')
group = [datetimes[0]]
for dt in datetimes[1:]:
    ts = datetime.datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
    diff = ts - min_ts
    if diff.seconds < 4:
        group.append(dt)
    else:
        grouped_datetimes.append(group)
        group = [dt]
        min_ts = ts

# Add the last group that was built up
if group:
    grouped_datetimes.append(group)


for index, group in enumerate(grouped_datetimes):
    for ts in group:
        print(f'{ts}\t{index}')

那只是一个快速而肮脏的解决方案。根据您的确切用例,您肯定可以对其进行改进。希望您能了解如何使用timedelta来解决它。