我在python中阅读了以下trie的实现: https://stackoverflow.com/a/11016430/2225221
并尝试为其删除功能。 基本上,即使在开头我也遇到了问题:如果你想删除一个单词,它可以有子单词,或者它可以是另一个单词的“子词”。
如果你用“del dict [key]”删除,你也会删除上面提到的这两种词。 任何人都可以帮助我,如何正确删除所选词(让我们假设它在特里)
答案 0 :(得分:3)
基本上,要删除trie中的单词(因为它在您链接的答案中实现),您只需删除其_end
标记,例如:
def remove_word(trie, word):
current_dict = trie
for letter in word:
current_dict = current_dict.get(letter, None)
if current_dict is None:
# the trie doesn't contain this word.
break
else:
del current_dict[_end]
但请注意,这并不能确保trie的最小尺寸。删除单词后,左侧可能存在任何单词不再使用的分支。这不会影响数据结构的正确性,只是意味着trie可能消耗的内存超过绝对必要的内存。您可以通过从叶节点向后迭代并删除分支来改进这一点,直到找到具有多个子节点的分支。
编辑:这里有一个想法,你如何实现一个删除任何不必要的分支的删除函数。可能有一种更有效的方法,但这可能会让你开始:
def remove_word2(trie, word):
current_dict = trie
path = [current_dict]
for letter in word:
current_dict = current_dict.get(letter, None)
path.append(current_dict)
if current_dict is None:
# the trie doesn't contain this word.
break
else:
if not path[-1].get(_end, None):
# the trie doesn't contain this word (but a prefix of it).
return
deleted_branches = []
for current_dict, letter in zip(reversed(path[:-1]), reversed(word)):
if len(current_dict[letter]) <= 1:
deleted_branches.append((current_dict, letter))
else:
break
if len(deleted_branches) > 0:
del deleted_branches[-1][0][deleted_branches[-1][1]]
del path[-1][_end]
基本上,它首先找到即将删除的单词的“路径”,然后向后迭代以找到可以删除的节点。然后删除可以删除的路径的根(也隐式删除_end
节点)。
答案 1 :(得分:1)
我认为最好递归执行,代码如下:
def remove(self, word):
self.delete(self.tries, word, 0)
def delete(self, dicts, word, i):
if i == len(word):
if 'end' in dicts:
del dicts['end']
if len(dicts) == 0:
return True
else:
return False
else:
return False
else:
if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
if len(dicts[word[i]]) == 0:
del dicts[word[i]]
return True
else:
return False
else:
return False
答案 2 :(得分:0)
处理这类结构的一种方法是recursion。在这种情况下递归的好处是它会拉到trie的底部,然后将返回的值传递回分支。
以下功能就是这样做的。它转到叶子并删除_end
值,以防输入的单词是另一个的前缀。然后它传递一个布尔值(boo
),表示current_dict
仍在一个偏远的分支中。一旦我们达到当前dict有多个子节点的位置,我们删除相应的分支并将boo设置为False
,这样每个剩余的递归都不会执行任何操作。
def trie_trim(term, trie=SYNONYMS, prev=0):
# checks that we haven't hit the end of the word
if term:
first, rest = term[0], term[1:]
current_length = len(trie)
next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)
# this statement avoids trimming excessively if the input is a prefix because
# if the word is a prefix, the first returned value will be greater than 1
if boo and next_length > 1:
boo = False
# this statement checks for the first occurrence of the current dict having more than one child
# or it checks that we've hit the bottom without trimming anything
elif boo and (current_length > 1 or not prev):
del trie[first]
boo = False
return current_length, boo
# when we do hit the end of the word, delete _end
else:
del trie[_end]
return len(trie) + 1, True
答案 3 :(得分:0)
def remove_a_word_util(self, word, idx, node):
if len(word) == idx:
node.is_end_of_word = False
return bool(node.children)
ch = word[idx]
if ch not in node.children:
return True
flag = self.remove_a_word_util(word, idx+1, node.children[ch])
if flag:
return True
node.children.pop(ch)
return bool(node.children) or node.is_end_of_word