Question

我正在处理很多行，我想根据当前行中的一个值x是否在上一行中x值的100之内来对它们进行分组。

例如

5, "hello"
10, "was"
60, "bla"
5000, "qwerty"

＆＃34;你好＆＃34;，＆＃34;＆＃34;和＆＃34; bla＆＃34;应该是一个群体，＆＃34; qwerty＆＃34;另一个。

有没有办法用groupby整齐地解决这个问题？我能想到的所有解决方案都有点像hackish，比如将dict默认参数与前一个值一起使用，并在每次调用groupby中的函数（key）时更新它。

Answer 1

您可以编写一个简单的类来封装临时变量，然后使用该类的方法作为键函数：

class KeyClass(object):
    def __init__(self):
        self.lastValue = None
        self.currentKey = 1

    def handleVal(self, val):
        if self.lastValue is not None and abs(val - self.lastValue) > 100:
            self.currentKey += 1
        self.lastValue = val
        return self.currentKey

>>> [(k, list(g)) for k, g in itertools.groupby(data, KeyClass().handleVal)]
[(1, [1, 2, 100, 105]), (2, [300, 350, 375]), (3, [500]), (4, [800, 808])]

为了好玩，我还想出了这种相当令人费解的方法，通过使用预先高级生成器的send方法作为关键功能来实现：

def keyGen():
    curKey = 1
    newVal = yield None
    while True:
        oldVal, newVal = newVal, (yield curKey)
        if oldVal is None or abs(newVal-oldVal) > 100:
            curKey += 1

key = keyGen()
next(key)

>>> [(k, list(g)) for k, g in itertools.groupby(data, key.send)]
[(1, [1, 2, 100, 105]), (2, [300, 350, 375]), (3, [500]), (4, [800, 808])]

围绕这个问题可能是一个很好的理解.send（这对我来说！）。

Answer 2

itertools.groupby可能有一些聪明的技巧，但为您的特定问题编写自定义生成器函数非常简单。也许是这样的（未经测试）：

def grouper(it):
    group = []
    for item in it:
        if not group or abs(int(item[0]) - int(group[-1][0])) < 100:
            group.append(item)
        else:
            yield group
            group = [item]
    if group:  # yield final group if not empty
        yield group

用法类似于

with open(filename) as fid:
    for group in grouper(line.split(',') for line in fid):
        # do something with group
        for item in group:
            # do something with item

具有记忆状态

2 个答案: