关于复杂性的说明

Question

我有一个像

这样的词典

{'A': 0, 'B': 1, 'C': 2, 'D': 3, etc}

如果未订购字典，如何在不创建值间隙的情况下从此字典中删除元素？

示例：

我有一个大矩阵，其中行代表单词，列代表遇到这些单词的文档。我将单词及其相应的索引存储为字典。例如。对于这个矩阵

字典看起来像：

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

如果我删除单词'apple'和'banana'，则矩阵只会包含两行。因此，字典中'orange'的值现在应该等于0而不是1，而'pear'的值应该是1而不是3 }。

在Python中，3.6+字典是有序的，所以我可以写这样的东西来重新分配值：

i = 0
for k, v in words.items():
  v = i
  i += 1

或者

words = dict(zip(terms.keys(), range(0, matrix.shape[0])))

我认为，这远不是改变价值观的最有效方式，而且无法使用无序词典。如何有效地做到这一点？如果没有订购字典，有没有办法轻松重新分配值？

Answer 1

将dict转换为排序列表，然后构建一个没有您要删除的单词的新dict：

import itertools

to_remove = {'apple', 'banana'}

# Step 1: sort the words
ordered_words = [None] * len(words)
for word, index in words.items():
    ordered_words[index] = word
# ordered_words: ['apple', 'orange', 'banana', 'pear']

# Step 2: Remove unwanted words and create a new dict
counter = itertools.count()
words = {word: next(counter) for word in ordered_words if word not in to_remove}
# result: {'orange': 0, 'pear': 1}

这有一个O（n）的运行时，因为使用索引操作手动排序列表是一个线性操作，而不是sorted，它将是O（n log n）。

另请参阅itertools.count和next的文档。

Answer 2

您可以使用现有逻辑，使用已排序字典的表示形式：

import operator

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
sorted_words = sorted(words.items(), key=operator.itemgetter(1))

for i, (k, v) in enumerate(sorted_words):
    words[k] = i

Answer 3

最初我们有

=head1 NAME Pod::Example - Example POD document =head1 SYNOPSIS pod2man thisdoc.pod >thisdoc.1 =head1 DESCRIPTION Lightweight syntax for subheads, hyperlinks, indented lists, and not much else. Natively supported in Perl source files to facilitate a crude form of literate programming.

要根据最小值到最大值重新排序，您可以使用words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}和词典理解。

sorted

std = sorted(words, key=lambda x: words[x])

这没关系..？

Answer 4

您使用了错误的工具（dict）来完成工作，您应该使用list

class vocabulary:
    def __init__(self, *words):
        self.words=list(words)
    def __getitem__(self, key):
        try:
             return self.words.index(key)
        except ValueError:
            print (key + " is not in vocabulary")
    def remove(self, word):
        if type(word)==int:
           del self.words[word]
           return
        return self.remove(self[word])

words = vocabulary("apple" ,"banana", "orange")
print (words["banana"]) # outputs 1
words.remove("apple")
print (words["banana"]) # outputs 0

关于复杂性的说明

我有几条评论提到dict效率更高，因为它的查询时间为O(1)，而list的查询时间为O(n)。

在这种情况下，这只是不正确。

散列表的O(1)保证（python中为dict）是分摊复杂性的结果，意味着您平均一次生成的查找表的常见用法，假设你的哈希函数是平衡的。

此摊销计算不会考虑删除整个字典并在每次删除项目时重新生成它，正如其他一些答案所示。

list实施和dict实施具有相同的最坏情况复杂度O(n)。

然而，list实现可以使用两行python（bisect）进行优化，以使最坏情况下的复杂度为O(log(n))

Answer 5

您可以始终保留一个将索引映射到单词的倒置字典，并将其用作保持原始字典顺序的参考。然后你可以删除单词，然后重新重建字典：

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

# reverse dict for index -> word mappings
inverted = {i: word for word, i in words.items()}

remove = {'apple', 'banana'}

# sort/remove the words
new_words = [inverted[i] for i in range(len(inverted)) if inverted[i] not in remove]

# rebuild new dictionary
new_dict = {word: i for i, word in enumerate(new_words)}

print(new_dict)

哪个输出：

{'orange': 0, 'pear': 1}

注意：与接受的答案一样，这也是O(n)。

重新分配字典值

5 个答案:

关于复杂性的说明