也就是说,对于一个句子,将其分解为有序单词的所有可能组合,不会省略任何单词
例如,输入 “猫坐在垫子上”
输出
[("The", "cat sat on the mat"),
("The cat", "sat on the mat"),
("The cat", "sat", "on the mat")] #etc
但不是
("The mat", "cat sat on the") # out of order
("The cat"), ("mat") # words missing
我查看了itertools中的方法,但是看不到他们完成这项工作,因为组合会遗漏项目(“猫”,“垫子”),排列会改变顺序。
我在这些工具中遗漏了什么,或者他们是不是正确的?
(为了清楚起见,这不是关于如何拆分字符串,而是如何获得组合的问题)
答案 0 :(得分:1)
根据WordAligned中Raymond Hettinger's partition recipe的灵感来修改Python 3的this blog post,以及列表中的每个分区案例,我们可以使用来自itertools的chain
和combinations
来完成此操作。
from itertools import chain, combinations
def partition(iterable):
n = len(input_list)
b, mid, e = [0], list(range(1, n)), [n]
getslice = input_list.__getitem__
splits = (d for i in range(n) for d in combinations(mid, i))
return [[input_list[sl] for sl in map(slice, chain(b, d), chain(d, e))]
for d in splits]
<强>演示强>:
>>> print(partition(input_list))
[[['The', 'cat', 'sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the', 'mat']], [['The', 'cat', 'sat', 'on'], ['the', 'mat']], [['The', 'cat', 'sat', 'on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on'], ['the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the'], ['mat']], [['The', 'cat'], ['sat'], ['on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on'], ['the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the'], ['mat']], [['The', 'cat', 'sat', 'on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on', 'the', 'mat']], [['The'], ['cat'], ['sat', 'on'], ['the', 'mat']], [['The'], ['cat'], ['sat', 'on', 'the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the'], ['mat']], [['The'], ['cat', 'sat', 'on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the', 'mat']], [['The', 'cat'], ['sat'], ['on', 'the'], ['mat']], [['The', 'cat'], ['sat', 'on'], ['the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the', 'mat']], [['The'], ['cat'], ['sat'], ['on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on'], ['the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the'], ['mat']]]