我使用以下代码查找列表中差异为< = 1的群集
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i, x): (i-x)):
print map(itemgetter(1), g)
但是,如果我将data
更改为日期时间数组,以查找相隔仅1小时的日期时间群,则会失败。
我正在尝试以下方法:
>>> data
array([datetime.datetime(2016, 10, 1, 8, 0),
datetime.datetime(2016, 10, 1, 9, 0),
datetime.datetime(2016, 10, 1, 10, 0), ...,
datetime.datetime(2019, 1, 3, 9, 0),
datetime.datetime(2019, 1, 3, 10, 0),
datetime.datetime(2019, 1, 3, 11, 0)], dtype=object)
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i, x): (i-x).total_seconds()/3600):
print map(itemgetter(1), g)
错误是:
for k, g in groupby(enumerate(data), lambda (i, x): int((i-x).total_seconds()/3600)):
TypeError: unsupported operand type(s) for -: 'int' and 'datetime.datetime'
网上有很多解决方案,但我想将这个特定的解决方案用于学习。
答案 0 :(得分:1)
如果你想获得项目的所有子序列,使得每个项目比前一个项目晚一个小时(不是每个项目的集群都在一个小时内),你需要迭代对(data[i-1], data[i])
。目前,当您尝试从(i, data[i])
中减去TypeError
时,您只是在data[i]
上进行迭代,从而引发i
。一个工作示例可能如下所示:
from itertools import izip
def find_subsequences(data):
if len(data) <= 1:
return []
current_group = [data[0]]
delta = 3600
results = []
for current, next in izip(data, data[1:]):
if abs((next - current).total_seconds()) > delta:
# Here, `current` is the last item of the previous subsequence
# and `next` is the first item of the next subsequence.
if len(current_group) >= 2:
results.append(current_group)
current_group = [next]
continue
current_group.append(next)
return results