我正在尝试构建一个拼写校正器使用图。
步骤1:我使用一些书籍作为语料库和Python networkx包来构建直接图形节点是单词,并且我在此图形中为每个节点添加一个名为“DISTANCE”的属性,表示两个单词之间的距离。
example:
graph[‘love’][‘you’][‘DISTANCE’] = 37,means, in my corpus,‘love you’ appeals 37 times.
graph[‘you’][‘love’][‘DISTANCE’] = 39,means, in my corpus,‘you love’ appeals 39 times.
显然,图形['爱'] ['你']和图形['你'] ['爱']是不同的。 我的问题是当我完成一些操作时,我得到一个列表包含列表。 像这样,(长度是可变的):
[
[who,whom,whose],
[are,all,],
[that,than,this]
]
每个子列表包含可能正确的单词,我的问题是我想将此列表转换为此。
[
[who,are,that],
[who,are,than],
[who,are,this],
[who,all,that],
[who,all,than],
[who,all,this],
[whom,are,that],
[whom,are,than],
[whom,are,this],
[whom,all,that],
[whom,all,than],
[whom,all,this],
[whose,are,that],
[whose,are,than],
[whose,are,this],
[whose,all,that],
[whose,all,than],
[whose,all,this],
]
所以我可以计算出距离,确定最佳距离。
我是算法中的新手,你知道哪种算法能满足这个要求吗?如果你有一些建议可以帮助我使这个拼写纠正器更有效,请告诉我。
谢谢!
答案 0 :(得分:3)
您可以使用itertools.product
进行转换:
from itertools import product
d = [
['who','whom','whose'],
['are','all'],
['that','than','this']
]
print list(product(*d))
格式化输出:
[
('who', 'are', 'that'),
('who', 'are', 'than'),
('who', 'are', 'this'),
('who', 'all', 'that'),
('who', 'all', 'than'),
('who', 'all', 'this'),
('whom', 'are', 'that'),
('whom', 'are', 'than'),
('whom', 'are', 'this'),
('whom', 'all', 'that'),
('whom', 'all', 'than'),
('whom', 'all', 'this'),
('whose', 'are', 'that'),
('whose', 'are', 'than'),
('whose', 'are', 'this'),
('whose', 'all', 'that'),
('whose', 'all', 'than'),
('whose', 'all', 'this')
]