Question

我有一长串dicts作为我的数据集（列表中的每一行都是字典）。

此列表中有一些行需要删除（因为这些行中的数据与数据集的其余部分不一致）。

我已经创建了一个函数，用于标识我想要删除的行的索引号，如下所示：

indices_to_remove = [10200, 15006, 22833, 33442, 54214]

我想有一个函数，如果它们的索引与此列表匹配，则删除/删除列表中的所有行。

这是我到目前为止所尝试的内容：

my_original_dataset = *a list of dicts*

indices_to_remove = [10200, 15006, 22833, 33442, 54214]

def remove_missing_rows(dataset):
    new_list = []
    for row_dict in dataset:
        if row_dict not in indices_to_remove:
            new_list.append(row_dict)
    return new_list

new_dataset_all_empty_removed = remove_missing_rows(my_original_dataset)

我意识到问题是row_dict是指实际的行而不是行的索引号，但是不知道如何在这里引用行号。

Answer 1

您可以使用enumerate生成与行本身一起的索引。另一个加快每个索引的查找时间的方法是使索引列表成为一个集合;集合已针对成员资格检查进行了优化：

indices_to_remove = {10200, 15006, 22833, 33442, 54214}

def remove_missing_rows(dataset):
    new_list = []
    for i, row_dict in enumerate(dataset):
        if i not in indices_to_remove:
            new_list.append(row_dict)
    return new_list

你也可以使用列表理解 flatly ，而不必创建一个函数：

new_list = [x for i, x in enumerate(dataset) if i not in indices_to_remove]

这将创建一个新列表，其中indices_to_remove中的所有项目都已删除。

Answer 2

从字面上删除数据集，dataset.pop(i)正常工作

从结尾到开始你必须pop所以indices_to_remove需要排序，或者你必须明确地做到这一点

dataset = [1,2,3,4,5]
indices_to_remove = [1,3]

[dataset.pop(i) for i in indices_to_remove[::-1]]

dataset

Out[195]: [1, 3, 5]

可以忽略listcomp的输出 - 你想要的只是从列表中删除行的“副作用”

as sugested：

for i in indices_to_remove[::-1]:
    dataset.pop(i)

可能是'清洁'

Answer 3

我认为不是这样 '如果row_dict不在indices_to_remove中：'在第8行代码中这将删除 '如果dataset.index（row_dict）不在indices_to_remove中：'

从序列列表

3 个答案: