我正在尝试将消息系统的消息文本拆分为最多160个字符长的序列,以空格结尾,除非它是最后一个序列,然后只要它等于或小于它就可以结束超过160个字符。
这个重复表达'。{1,160} \ s'几乎可以工作,但它会删除一条消息的最后一个字,因为通常消息的最后一个字符不是空格。
我也试过'。{1,160} \ s |。{1,160}'但是这不起作用,因为最后的序列只是最后一个空格之后的剩余文本。有没有人知道如何做到这一点?
实施例
two_cities = ("It was the best of times, it was the worst of times, it was " +
"the age of wisdom, it was the age of foolishness, it was the " +
"epoch of belief, it was the epoch of incredulity, it was the " +
"season of Light, it was the season of Darkness, it was the " +
"spring of hope, it was the winter of despair, we had " +
"everything before us, we had nothing before us, we were all " +
"going direct to Heaven, we were all going direct the other " +
"way-- in short, the period was so far like the present period," +
" that some of its noisiest authorities insisted on its being " +
"received, for good or for evil, in the superlative degree of " +
"comparison only.")
chunks = re.findall('.{1,160}\s|.{1,160}', two_cities)
print(chunks)
将返回
['这是最好的时期,这是最糟糕的时期,它是智慧的时代,它是愚蠢的时代,它是信仰的时代,它是'的时代', “不可思议,这是光明的季节,是黑暗的季节,是希望的春天,是绝望的冬天,我们面前的一切,我们,” “在我们面前没有任何东西,我们都直接走向天堂,我们都是直接走向另一条道路 - 简而言之,这段时期就像现在一样,”, “它的一些最嘈杂的当局坚持要求它在最高级别的比较中被接受,无论是善还是恶,” '仅。']
列表的最后一个元素应该是
'一些最吵闹的当局坚持要求它在最高级别的比较中被接受,无论是好是坏。'
不是'仅'。
答案 0 :(得分:1)
试试这个 - .{1,160}(?:(?<=[ ])|$)
.{1,160} # 1 - 160 chars
(?:
(?<= [ ] ) # Lookbehind, must end with a space
| $ # or, be at End of String
)
信息 -
默认情况下,引擎会尝试匹配160个字符(贪婪) 然后它检查表达式的下一部分。
后方强制与.{1,160}
匹配的最后一个字符是空格。
或者,如果在字符串的末尾,则不执行。
如果lookbehind失败,而不是在字符串的末尾,引擎将回溯到159个字符,然后再次检查。这一过程重复直到断言通过。
答案 1 :(得分:0)
您应该避免使用正则表达式,因为它们可能效率低下。
我会推荐这样的东西:(see it in action here)
list = []
words = two_cities.split(" ")
for i in range(0, len(words)):
str = []
while i < len(words) and len(str) + len(words[i]) <= 160:
str.append(words[i] + " ")
i += 1
list.append(''.join(str))
print list
这会创建一个包含空格的所有单词的列表。
如果单词适合字符串,它会将它添加到字符串中。如果不能,则将其添加到列表中并启动一个新字符串。最后,您有一个结果列表。