我一直在寻找一段时间,但我似乎找不到这个小问题的答案。
我有这个代码应该在每三个单词之后拆分字符串:
import re
def splitTextToTriplet(Text):
x = re.split('^((?:\S+\s+){2}\S+).*',Text)
return x
print(splitTextToTriplet("Do you know how to sing"))
目前的输出是这样的:
['', 'Do you know', '']
但我实际上期待这个输出:
['Do you know', 'how to sing']
如果我打印(splitTextToTriplet(“你知道怎么做”)),它也应该输出:
['Do you know', 'how to']
如何更改正则表达式以产生预期的输出?
答案 0 :(得分:6)
我认为re.split
可能不是最佳方法,因为后视不能采用可变长度模式。
相反,您可以使用str.split
,然后将单词连接在一起。
def splitTextToTriplet(string):
words = string.split()
grouped_words = [' '.join(words[i: i + 3]) for i in range(0, len(words), 3)]
return grouped_words
splitTextToTriplet("Do you know how to sing")
# ['Do you know', 'how to sing']
splitTextToTriplet("Do you know how to")
# ['Do you know', 'how to']
虽然建议使用此解决方案,如果您的某些空白区域是换行符,则该信息将在此过程中丢失。
答案 1 :(得分:3)
我使用re.findall
作为您期望的输出。为了获得更通用的拆分功能,我将splitTextToTriplet
上的splitTextonWords
替换为numberOfWords
作为参数:
import re
def splitTextonWords(Text, numberOfWords=1):
if (numberOfWords > 1):
text = Text.lstrip()
pattern = '(?:\S+\s*){1,'+str(numberOfWords-1)+'}\S+(?!=\s*)'
x =re.findall(pattern,text)
elif (numberOfWords == 1):
x = Text.split()
else:
x = None
return x
print(splitTextonWords("Do you know how to sing", 3))
print(splitTextonWords("Do you know how to", 3))
print(splitTextonWords("Do you know how to sing how to dance how to", 3))
print(splitTextonWords("A sentence this code will fail at", 3))
print(splitTextonWords("A sentence this code will fail at ", 3))
print(splitTextonWords(" A sentence this code will fail at s", 3))
print(splitTextonWords(" A sentence this code will fail at s", 4))
print(splitTextonWords(" A sentence this code will fail at s", 2))
print(splitTextonWords(" A sentence this code will fail at s", 1))
print(splitTextonWords(" A sentence this code will fail at s", 0))
输出:
['你知道','怎么唱'] ['你知道','怎么样'] ['你知道','怎么唱','怎么跳','怎么'' ['一句话','代码将失败','在'] ['一句话','代码将失败','在'] ['一句话','代码将失败','在's'] ['这句代码','将在s'失败'] ['一个句子','这个代码','将失败','在's'] ['A','句子','这个','代码','将','失败','在','s'] 无
答案 2 :(得分:2)
使用grouper
itertools recipe:
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
另请参阅为您实现此配方的more_itertools
第三方库。
<强>代码强>
def split_text_to_triplet(s):
"""Return strings of three words."""
return [" ".join(c) for c in grouper(3, s.split())]
split_text_to_triplet("Do you know how to sing")
# ['Do you know', 'how to sing']