Question

我正在尝试探索Python内置函数的功能。我正在尝试处理一些字符串，例如：

'the fast dog'

并将字符串分解为所有可能的有序短语，如列表。上面的示例将输出如下：

[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]

关键是在生成可能的短语时，需要保留字符串中单词的原始排序。

我已经能够得到一个可以做到这一点的功能，但它相当麻烦和丑陋。但是，我想知道Python中的某些内置功能是否有用。我当时认为可以将字符串分割为各种空格，然后以递归方式应用于每个分割。可能有人有什么建议吗？

Answer 1

使用itertools.combinations：

import itertools

def break_down(text):
    words = text.split()
    ns = range(1, len(words)) # n = 1..(n-1)
    for n in ns: # split into 2, 3, 4, ..., n parts.
        for idxs in itertools.combinations(ns, n):
            yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]

示例：

>>> for x in break_down('the fast dog'):
...     print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']

>>> for x in break_down('the really fast dog'):
...     print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']

Answer 2

想一想单词之间的差距。该集合的每个子集对应于一组分裂点，最后对应于短语的分割：

the fast dog jumps
   ^1   ^2  ^3     - these are split points

例如，子集{1,3}对应于拆分{"the", "fast dog", "jumps"}

子集可以枚举为1到2 ^（L-1）-1的二进制数，其中L是单词数

001 -> the fast dog, jumps
010 -> the fast, dog jumps
011 -> the fast, dog, jumps
etc.

Answer 3

我将详细介绍@ grep的解决方案，同时仅使用您在问题中说明的内置函数并避免递归。你可以按照以下方式实现他的答案：

#! /usr/bin/python3

def partition (phrase):
    words = phrase.split () #split your phrase into words
    gaps = len (words) - 1 #one gap less than words (fencepost problem)
    for i in range (1 << gaps): #the 2^n possible partitions
        r = words [:1] #The result starts with the first word
        for word in words [1:]:
            if i & 1: r.append (word) #If "1" split at the gap
            else: r [-1] += ' ' + word #If "0", don't split at the gap
            i >>= 1 #Next 0 or 1 indicating split or don't split
        yield r #cough up r

for part in partition ('The really fast dog.'):
    print (part)

Answer 4

您请求的操作通常称为“分区”，可以通过任何类型的列表完成。所以，让我们实现任何列表的分区：

def partition(lst):
    for i in xrange(1, len(lst)):
        for r in partition(lst[i:]):
            yield [lst[:i]] + r
    yield [lst]

请注意，较长的列表会有很多分区，因此最好将其实现为生成器。要检查它是否有效，请尝试：

print list(partition([1, 2, 3]))

现在，您希望使用单词作为元素对字符串进行分区。执行此操作的最简单方法是按单词拆分文本，运行原始分区算法，并将单词组合并为字符串：

def word_partition(text):
    for p in partition(text.split()):
        yield [' '.join(group) for group in p]

再次，要测试它，请使用：

print list(word_partition('the fast dog'))

将字符串拆分为所有可能的有序短语

4 个答案: