将字符串拆分为所有可能的有序短语

时间:2013-08-23 15:40:12

标签: python string list

我正在尝试探索Python内置函数的功能。我正在尝试处理一些字符串,例如:

'the fast dog'

并将字符串分解为所有可能的有序短语,如列表。上面的示例将输出如下:

[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]

关键是在生成可能的短语时,需要保留字符串中单词的原始排序。

我已经能够得到一个可以做到这一点的功能,但它相当麻烦和丑陋。但是,我想知道Python中的某些内置功能是否有用。我当时认为可以将字符串分割为各种空格,然后以递归方式应用于每个分割。可能有人有什么建议吗?

4 个答案:

答案 0 :(得分:10)

使用itertools.combinations

import itertools

def break_down(text):
    words = text.split()
    ns = range(1, len(words)) # n = 1..(n-1)
    for n in ns: # split into 2, 3, 4, ..., n parts.
        for idxs in itertools.combinations(ns, n):
            yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]

示例:

>>> for x in break_down('the fast dog'):
...     print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']

>>> for x in break_down('the really fast dog'):
...     print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']

答案 1 :(得分:4)

想一想单词之间的差距。该集合的每个子集对应于一组分裂点,最后对应于短语的分割:

the fast dog jumps
   ^1   ^2  ^3     - these are split points

例如,子集{1,3}对应于拆分{"the", "fast dog", "jumps"}

子集可以枚举为1到2 ^(L-1)-1的二进制数,其中L是单词数

001 -> the fast dog, jumps
010 -> the fast, dog jumps
011 -> the fast, dog, jumps
etc.

答案 2 :(得分:3)

我将详细介绍@ grep的解决方案,同时仅使用您在问题中说明的内置函数并避免递归。你可以按照以下方式实现他的答案:

#! /usr/bin/python3

def partition (phrase):
    words = phrase.split () #split your phrase into words
    gaps = len (words) - 1 #one gap less than words (fencepost problem)
    for i in range (1 << gaps): #the 2^n possible partitions
        r = words [:1] #The result starts with the first word
        for word in words [1:]:
            if i & 1: r.append (word) #If "1" split at the gap
            else: r [-1] += ' ' + word #If "0", don't split at the gap
            i >>= 1 #Next 0 or 1 indicating split or don't split
        yield r #cough up r

for part in partition ('The really fast dog.'):
    print (part)

答案 3 :(得分:1)

您请求的操作通常称为“分区”,可以通过任何类型的列表完成。所以,让我们实现任何列表的分区:

def partition(lst):
    for i in xrange(1, len(lst)):
        for r in partition(lst[i:]):
            yield [lst[:i]] + r
    yield [lst]

请注意,较长的列表会有很多分区,因此最好将其实现为生成器。要检查它是否有效,请尝试:

print list(partition([1, 2, 3]))

现在,您希望使用单词作为元素对字符串进行分区。执行此操作的最简单方法是按单词拆分文本,运行原始分区算法,并将单词组合并为字符串:

def word_partition(text):
    for p in partition(text.split()):
        yield [' '.join(group) for group in p]

再次,要测试它,请使用:

print list(word_partition('the fast dog'))