我正在尝试探索Python内置函数的功能。我正在尝试处理一些字符串,例如:
'the fast dog'
并将字符串分解为所有可能的有序短语,如列表。上面的示例将输出如下:
[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]
关键是在生成可能的短语时,需要保留字符串中单词的原始排序。
我已经能够得到一个可以做到这一点的功能,但它相当麻烦和丑陋。但是,我想知道Python中的某些内置功能是否有用。我当时认为可以将字符串分割为各种空格,然后以递归方式应用于每个分割。可能有人有什么建议吗?
答案 0 :(得分:10)
import itertools
def break_down(text):
words = text.split()
ns = range(1, len(words)) # n = 1..(n-1)
for n in ns: # split into 2, 3, 4, ..., n parts.
for idxs in itertools.combinations(ns, n):
yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]
示例:
>>> for x in break_down('the fast dog'):
... print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']
>>> for x in break_down('the really fast dog'):
... print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']
答案 1 :(得分:4)
想一想单词之间的差距。该集合的每个子集对应于一组分裂点,最后对应于短语的分割:
the fast dog jumps
^1 ^2 ^3 - these are split points
例如,子集{1,3}
对应于拆分{"the", "fast dog", "jumps"}
子集可以枚举为1到2 ^(L-1)-1的二进制数,其中L是单词数
001 -> the fast dog, jumps
010 -> the fast, dog jumps
011 -> the fast, dog, jumps
etc.
答案 2 :(得分:3)
我将详细介绍@ grep的解决方案,同时仅使用您在问题中说明的内置函数并避免递归。你可以按照以下方式实现他的答案:
#! /usr/bin/python3
def partition (phrase):
words = phrase.split () #split your phrase into words
gaps = len (words) - 1 #one gap less than words (fencepost problem)
for i in range (1 << gaps): #the 2^n possible partitions
r = words [:1] #The result starts with the first word
for word in words [1:]:
if i & 1: r.append (word) #If "1" split at the gap
else: r [-1] += ' ' + word #If "0", don't split at the gap
i >>= 1 #Next 0 or 1 indicating split or don't split
yield r #cough up r
for part in partition ('The really fast dog.'):
print (part)
答案 3 :(得分:1)
您请求的操作通常称为“分区”,可以通过任何类型的列表完成。所以,让我们实现任何列表的分区:
def partition(lst):
for i in xrange(1, len(lst)):
for r in partition(lst[i:]):
yield [lst[:i]] + r
yield [lst]
请注意,较长的列表会有很多分区,因此最好将其实现为生成器。要检查它是否有效,请尝试:
print list(partition([1, 2, 3]))
现在,您希望使用单词作为元素对字符串进行分区。执行此操作的最简单方法是按单词拆分文本,运行原始分区算法,并将单词组合并为字符串:
def word_partition(text):
for p in partition(text.split()):
yield [' '.join(group) for group in p]
再次,要测试它,请使用:
print list(word_partition('the fast dog'))