Python:每三个单词拆分字符串

时间:2018-03-10 00:49:52

标签: python regex python-3.x

我一直在寻找一段时间,但我似乎找不到这个小问题的答案。

我有这个代码应该在每三个单词之后拆分字符串:

import re

def splitTextToTriplet(Text):
    x = re.split('^((?:\S+\s+){2}\S+).*',Text)
    return x


print(splitTextToTriplet("Do you know how to sing"))

目前的输出是这样的:

['', 'Do you know', '']

但我实际上期待这个输出:

['Do you know', 'how to sing'] 

如果我打印(splitTextToTriplet(“你知道怎么做”)),它也应该输出:

['Do you know', 'how to'] 

如何更改正则表达式以产生预期的输出?

3 个答案:

答案 0 :(得分:6)

我认为re.split可能不是最佳方法,因为后视不能采用可变长度模式。

相反,您可以使用str.split,然后将单词连接在一起。

def splitTextToTriplet(string):
    words = string.split()
    grouped_words = [' '.join(words[i: i + 3]) for i in range(0, len(words), 3)]
    return grouped_words

splitTextToTriplet("Do you know how to sing")
# ['Do you know', 'how to sing']

splitTextToTriplet("Do you know how to")
# ['Do you know', 'how to'] 

虽然建议使用此解决方案,如果您的某些空白区域是换行符,则该信息将在此过程中丢失。

答案 1 :(得分:3)

我使用re.findall作为您期望的输出。为了获得更通用的拆分功能,我将splitTextToTriplet上的splitTextonWords替换为numberOfWords作为参数:

import re

def splitTextonWords(Text, numberOfWords=1):
    if (numberOfWords > 1):
        text = Text.lstrip()
        pattern = '(?:\S+\s*){1,'+str(numberOfWords-1)+'}\S+(?!=\s*)'
        x =re.findall(pattern,text)
    elif (numberOfWords == 1):
        x = Text.split()
    else: 
        x = None
    return x

print(splitTextonWords("Do you know how to sing", 3))
print(splitTextonWords("Do you know how to", 3))
print(splitTextonWords("Do you know how to sing how to dance how to", 3))
print(splitTextonWords("A sentence this code will fail at", 3))
print(splitTextonWords("A sentence this             code will fail at ", 3))
print(splitTextonWords("   A sentence this code will fail at s", 3))
print(splitTextonWords("   A sentence this code will fail at s", 4))
print(splitTextonWords("   A sentence this code will fail at s", 2))
print(splitTextonWords("   A sentence this code will fail at s", 1))
print(splitTextonWords("   A sentence this code will fail at s", 0))

输出:

  

['你知道','怎么唱']   ['你知道','怎么样']   ['你知道','怎么唱','怎么跳','怎么''   ['一句话','代码将失败','在']   ['一句话','代码将失败','在']   ['一句话','代码将失败','在's']   ['这句代码','将在s'失败']   ['一个句子','这个代码','将失败','在's']   ['A','句子','这个','代码','将','失败','在','s']   无

答案 2 :(得分:2)

使用grouper itertools recipe

import itertools


def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

另请参阅为您实现此配方的more_itertools第三方库。

<强>代码

def split_text_to_triplet(s):
    """Return strings of three words."""
    return [" ".join(c) for c in grouper(3, s.split())]


split_text_to_triplet("Do you know how to sing")
# ['Do you know', 'how to sing']