Question

我正在尝试根据子列表的第二个索引（即NoneType或int）中的数据类型将列表列表分割成块。

示例数据：

arr = [
[81, None, None],
[82, None, None],
[83, None, None],
[84, None, None],
[85, 161, 360],
[86, 161, 360],
[87, 161, 360],
[88, 160, 360],
[89, 160, 360],
[90, 160, 360],
[91, 160, 360],
[92, 160, 360],
[93, None, None],
[94, None, None],
[95, None, None],
[96, 153, 359],
[97, 153, 359],
[98, 153, 359],
[99, 153, 359]]

如我所说，这可以被视为列表列表，或者作为一个numpy数组（即numpy.array(arr)）。无论哪个更容易。

我正在尝试喜欢这个（不需要相同）：

[(81, 84, None),                   # or [[None, None], [None, None]...] ... either is fine.
 (85, 93, [[161, 360], [161, 360]]...),
 (93, 95, None),
 (96, 99, [[153, 359], [153, 359]]...)
]

马虎的尝试：

none_end = 0
none_start = False
blocks_loc = list()
for i in arr:
    if None in i:
        if not none_start:
            none_start = i[0]
        none_end = i[0]
    elif None not in i and none_start is not False:
        blocks_loc.append((none_start, none_end))
        none_start = False
        none_end = 0

然后我可以根据blocks_loc（现在包含[(81, 84, (93, 95)]）简单地提取数据。

然而，很难说出代码是如何可怕和丑陋。更好的东西会很棒。

Answer 1

我可以使用itertools.groupby：

from itertools import groupby
groups = (list(g) for k,g in groupby(arr, key=lambda x: x[1]))
final = [(g[0][0], g[-1][0], [x[1:] for x in g]) for g in groups]

给了我

>>> pprint.pprint(final)
[(81, 84, [[None, None], [None, None], [None, None], [None, None]]),
 (85, 87, [[161, 360], [161, 360], [161, 360]]),
 (88, 92, [[160, 360], [160, 360], [160, 360], [160, 360], [160, 360]]),
 (93, 95, [[None, None], [None, None], [None, None]]),
 (96, 99, [[153, 359], [153, 359], [153, 359], [153, 359]])]

..我刚注意到我使用x[1]作为分组的索引，而您想要x[2]。那么，这是留给读者的练习。 ; - ）

如果你想更好地控制输出（例如，为了处理开始和结束索引相同的情况），只需循环groupby返回的键/组对，就可以了。屈服于你喜欢什么。

另请注意groupby找到连续的组。如果您的数据不一定是连续的，您可以先排序。

通过第二个索引分割列表（或数组）列表的有效方法

1 个答案: