Question

我正在努力解决这个问题，因为我确信十几个for循环不能解决这个问题：

有一个排序的数字列表，如

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]

我想创建一个带有数字列表的字典，其中数字的差异（彼此跟随）不超过15.因此输出将是：

clusters = {
    1 : [123, 124, 128],
    2 : [160, 167],
    3 : [213, 215, 230, 245, 255, 257],
    4 : [400, 401, 402],
    5 : [430]
}

我目前的解决方案有点难看（我必须在最后删除重复...），我确信它可以用pythonic方式完成。

这就是我现在所做的：

clusters = {}  
dIndex = 0 
for i in range(len(numbers)-1) :
    if numbers[i+1] - numbers[i] <= 15 :
        if not clusters.has_key(dIndex) : clusters[dIndex] = []
        clusters[dIndex].append(numbers[i])
        clusters[dIndex].append(numbers[i+1])
    else : dIndex += 1

Answer 1

如果您的列表很小，则不是绝对必要的，但我可能会在＆＃34;流处理＆＃34; fashion：定义一个生成器，它使您的输入可迭代，并生成分组为数字运行的元素，区别为＆lt; = 15.然后您可以使用它来轻松生成您的字典。

def grouper(iterable):
    prev = None
    group = []
    for item in iterable:
        if not prev or item - prev <= 15:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
dict(enumerate(grouper(numbers), 1))

打印：

{1: [123, 124, 128],
 2: [160, 167],
 3: [213, 215, 230, 245, 255, 257],
 4: [400, 401, 402],
 5: [430]}

作为奖励，这使您甚至可以将您的运行分组为潜在无限列表（当然，只要它们被排序）。您还可以将索引生成部分粘贴到生成器本身（而不是使用enumerate）作为次要增强。

Answer 2

import itertools
import numpy as np

numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430])
nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)]

a, b = itertools.tee(nd)
next(b, None)
res = {}
for j, (f, b) in enumerate(itertools.izip(a, b)):
    res[j] = numbers[f:b]

如果你可以使用itertools和numpy。为迭代器技巧改编pairwise。移动索引需要+1，将0和len(numbers)添加到列表中可确保正确包含第一个和最后一个条目。

显然，您可以使用itertools执行此操作，但我喜欢tee。

Answer 3

使用生成器分离逻辑:(一个函数做一件事）

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]

def cut_indices(numbers):
    # this function iterate over the indices that need to be 'cut'
    for i in xrange(len(numbers)-1):
        if numbers[i+1] - numbers[i] > 15:
            yield i+1

def splitter(numbers):
    # this function split the original list into sublists.
    px = 0
    for x in cut_indices(numbers):
        yield numbers[px:x]
        px = x
    yield numbers[px:]

def cluster(numbers):
    # using the above result, to form a dict object.
    cluster_ids = xrange(1,len(numbers))
    return dict(zip(cluster_ids, splitter(numbers)))

print cluster(numbers)

以上代码给我

{1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]}

Answer 4

这是一个适用于列表或生成器的相对简单的解决方案。它懒得产生对(group_number, element)，因此如果需要，您必须单独进行实际分组。（或者你可能只需要组号。）

 from itertools import tee

 def group(xs, gap=15):
    # use `tee` to get two efficient iterators
    xs1, xs2 = tee(xs)

    # the first element is in group 0, also advance the second iterator
    group = 0
    yield (group, next(xs2))

    # after advancing xs2, this zip is pairs of consecutive elements
    for x, y in zip(xs1, xs2):
        # whenever the gap is too large, increment the group number
        if y - x > gap:
            group += 1
        # and yield the second number in the pair
        yield group, y

Answer 5

您可以使用numpy / pandas进行无（显式）循环的实现：

import pandas as pd    
import numpy as np

n = 15
numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
nnumbers = np.array(numbers)
clusters = pd.DataFrame({
    'numbers': numbers,
    'segment': np.cumsum([0] + list(1*(nnumbers[1:] - nnumbers[0:-1] > n))) + 1
}).groupby('segment').agg({'numbers': set}).to_dict()['numbers']

诀窍是移动数字列表，并将其差异与阈值（15）进行比较，以找出分段之间的“中断”。当然，第一个要素不会中断。然后使用cumsum函数获取细分并使用set函数（如果有重复的话）进行分组。希望自从发布此问题以来已经过去了很多年。

在列表中查找数字簇

5 个答案: