假设我们有一个树结构,其中每个节点都有3个属性:token,children,entries / ids。
任务:给定上下文窗口,例如
['take', 'a', 'long', 'bath', 'after', 'my', 'flight']
我希望生成与第一个令牌 take
相关的所有候选人,这些候选人
[(take, a, bath), (take, after), (take, flight)]
请注意,以下代码用于处理提前停止,请参见示例1,comic book
和comic book art
diff = len(node.entries) - len(node.children)
if diff == 1 and node.children != {}:
我做了什么:
def traverse(self, tokens, node, tokens_matched):
diff = len(node.entries) - len(node.children)
if diff == 1 and node.children != {}:
yield tokens_matched
if node.children == {}:
yield tokens_matched
for i in range(len(tokens)):
if tokens[i] in node.children.keys():
tokens_matched += (tokens[i],)
new_tokens = tokens[i+1:]
new_node = node.children[tokens[i]]
yield from self.traverse(new_tokens, new_node, tokens_matched)
def generate_candidates(self, tokens):
node = self.root
results = []
if tokens[0] in node.children.keys():
new_node = node.children[tokens[0]]
for i in range(1,len(tokens)):
tokens_matched = (tokens[0],)
new_tokens = tokens[i:]
results.extend(list(iself.traverse(new_tokens, new_node, tokens_matched)))
return results
示例1 :tokens = ['comic', 'book', 'art', 'on', 'the', 'web']
和我算法的输出(输出正确)
[('comic', 'book'), ('comic', 'book', 'art')]
示例2 :给定tokens = ['write', 'a', 'review', 'about','my','book']
和我算法的输出(输出不正确,正确输出应为[('write', 'a', 'book'),('write', 'about'), ('write', 'book')]
)
[('write', 'a', 'book'), ('write', 'a', 'about'), ('write', 'a', 'about', 'book'),\
('write', 'about'), ('write', 'about', 'book'), ('write', 'about'), ('write', 'about', 'book'),\
('write', 'book'), ('write', 'book')]
示例2没有返回正确的答案,很可能是因为我错误地处理了一些递归,不确定它到底发生了什么。此外,如果您有任何提高速度的建议,请随时发表评论。
被修改
解决了问题,更新
def traverse(self, tokens, node, tokens_matched):
diff = len(node.entries) - len(node.children)
if diff == 1 and node.children != {}:
yield tokens_matched
if node.children == {}:
yield tokens_matched
for i in range(len(tokens)):
if tokens[i] in node.children.keys():
tokens_matched += (tokens[i],)
new_tokens = tokens[i+1:]
current_token = node.token
if current_token != tokens_matched[-2]:
to_list = list(tokens_matched)
to_list.pop(-2)
tokens_matched = tuple(to_list)
new_node = node.children[tokens[i]]
yield from self.traverse(new_tokens, new_node, tokens_matched)
def generate_candidates(self, tokens):
node = self.root
tokens = tokens[0:min(len(tokens), self.MAX_GAP)]
candidates = []
if tokens[0] in node.children.keys():
new_node = node.children[tokens[0]]
tokens_matched = (tokens[0],)
new_tokens = tokens[1:]
candidates.extend(list(self.traverse(new_tokens, new_node, tokens_matched)))
# ne generator
candidates = CandidatesGenerator.ne_generator(tokens, candidates, ne_size=ne_size)
return candidates
还有一个问题,关于如何缩短任务执行时间的想法?这非常重要,因为这段代码会被执行数百万次,我希望尽可能地优化它(仍然在Python的范围内)。