Python - 在词典中查找单词的遍历词汇树

时间:2017-11-08 15:41:52

标签: python recursion tree tree-traversal

假设我们有一个树结构,其中每个节点都有3个属性:token,children,entries / ids。

树(令牌)的可视化示例可能如下所示: enter image description here

任务:给定上下文窗口,例如

['take', 'a', 'long', 'bath', 'after', 'my', 'flight']

我希望生成与第一个令牌 take相关的所有候选人,这些候选人

[(take, a, bath), (take, after), (take, flight)]

请注意,以下代码用于处理提前停止,请参见示例1,comic bookcomic book art

diff = len(node.entries) - len(node.children)
if diff == 1 and node.children != {}: 

我做了什么

def traverse(self, tokens, node, tokens_matched):
    diff = len(node.entries) - len(node.children)
    if diff == 1 and node.children != {}:
        yield tokens_matched
    if node.children == {}:
        yield tokens_matched

    for i in range(len(tokens)):
        if tokens[i] in node.children.keys():
            tokens_matched += (tokens[i],)
            new_tokens = tokens[i+1:]
            new_node = node.children[tokens[i]]
            yield from self.traverse(new_tokens, new_node, tokens_matched)



def generate_candidates(self, tokens):
    node = self.root

    results = []
    if tokens[0] in node.children.keys():
        new_node = node.children[tokens[0]]
        for i in range(1,len(tokens)):
            tokens_matched = (tokens[0],)
            new_tokens = tokens[i:]
            results.extend(list(iself.traverse(new_tokens, new_node, tokens_matched)))

    return results

示例1 tokens = ['comic', 'book', 'art', 'on', 'the', 'web']和我算法的输出(输出正确)

[('comic', 'book'), ('comic', 'book', 'art')]

示例2 :给定tokens = ['write', 'a', 'review', 'about','my','book']和我算法的输出(输出不正确,正确输出应为[('write', 'a', 'book'),('write', 'about'), ('write', 'book')]

[('write', 'a', 'book'), ('write', 'a', 'about'), ('write', 'a', 'about', 'book'),\
('write', 'about'), ('write', 'about', 'book'), ('write', 'about'), ('write', 'about', 'book'),\
('write', 'book'), ('write', 'book')]

示例2没有返回正确的答案,很可能是因为我错误地处理了一些递归,不确定它到底发生了什么。此外,如果您有任何提高速度的建议,请随时发表评论。

被修改

解决了问题,更新

def traverse(self, tokens, node, tokens_matched):
    diff = len(node.entries) - len(node.children)
    if diff == 1 and node.children != {}:
        yield tokens_matched
    if node.children == {}:
        yield tokens_matched

    for i in range(len(tokens)):
        if tokens[i] in node.children.keys():
            tokens_matched += (tokens[i],)
            new_tokens = tokens[i+1:]
            current_token = node.token
            if current_token != tokens_matched[-2]:
                to_list = list(tokens_matched)
                to_list.pop(-2)
                tokens_matched = tuple(to_list)
            new_node = node.children[tokens[i]]
            yield from self.traverse(new_tokens, new_node, tokens_matched)

def generate_candidates(self, tokens):
    node = self.root

    tokens = tokens[0:min(len(tokens), self.MAX_GAP)]
    candidates = []
    if tokens[0] in node.children.keys():
        new_node = node.children[tokens[0]]
        tokens_matched = (tokens[0],)
        new_tokens = tokens[1:]
        candidates.extend(list(self.traverse(new_tokens, new_node, tokens_matched)))

    # ne generator
    candidates = CandidatesGenerator.ne_generator(tokens, candidates, ne_size=ne_size)

    return candidates

还有一个问题,关于如何缩短任务执行时间的想法?这非常重要,因为这段代码会被执行数百万次,我希望尽可能地优化它(仍然在Python的范围内)。

0 个答案:

没有答案