Question

我正在尝试将消息系统的消息文本拆分为最多160个字符长的序列，以空格结尾，除非它是最后一个序列，然后只要它等于或小于它就可以结束超过160个字符。

这个重复表达'。{1,160} \ s'几乎可以工作，但它会删除一条消息的最后一个字，因为通常消息的最后一个字符不是空格。

我也试过'。{1,160} \ s |。{1,160}'但是这不起作用，因为最后的序列只是最后一个空格之后的剩余文本。有没有人知道如何做到这一点？

实施例

two_cities = ("It was the best of times, it was the worst of times, it was " +
         "the age of wisdom, it was the age of foolishness, it was the " +
         "epoch of belief, it was the epoch of incredulity, it was the " +
         "season of Light, it was the season of Darkness, it was the " +
         "spring of hope, it was the winter of despair, we had " +
         "everything before us, we had nothing before us, we were all " +
         "going direct to Heaven, we were all going direct the other " +
         "way-- in short, the period was so far like the present period," +
         " that some of its noisiest authorities insisted on its being " +
         "received, for good or for evil, in the superlative degree of " +
         "comparison only.")


chunks = re.findall('.{1,160}\s|.{1,160}', two_cities)
print(chunks)

将返回

['这是最好的时期，这是最糟糕的时期，它是智慧的时代，它是愚蠢的时代，它是信仰的时代，它是'的时代'， “不可思议，这是光明的季节，是黑暗的季节，是希望的春天，是绝望的冬天，我们面前的一切，我们，” “在我们面前没有任何东西，我们都直接走向天堂，我们都是直接走向另一条道路 - 简而言之，这段时期就像现在一样，”， “它的一些最嘈杂的当局坚持要求它在最高级别的比较中被接受，无论是善还是恶，” '仅。']

列表的最后一个元素应该是

'一些最吵闹的当局坚持要求它在最高级别的比较中被接受，无论是好是坏。'

不是'仅'。

Answer 1

试试这个 - .{1,160}(?:(?<=[ ])|$)

 .{1,160}                      # 1 - 160 chars
 (?:
      (?<= [ ] )                    # Lookbehind, must end with a space
   |  $                             # or, be at End of String
 )

信息 -

默认情况下，引擎会尝试匹配160个字符（贪婪）然后它检查表达式的下一部分。

后方强制与.{1,160}匹配的最后一个字符是空格。
或者，如果在字符串的末尾，则不执行。

如果lookbehind失败，而不是在字符串的末尾，引擎将回溯到159个字符，然后再次检查。这一过程重复直到断言通过。

Answer 2

您应该避免使用正则表达式，因为它们可能效率低下。

我会推荐这样的东西：（see it in action here）

list = []
words = two_cities.split(" ")

for i in range(0, len(words)):
    str = []
    while i < len(words) and len(str) + len(words[i]) <= 160:
        str.append(words[i] + " ")
        i += 1
    list.append(''.join(str))

print list

这会创建一个包含空格的所有单词的列表。

如果单词适合字符串，它会将它添加到字符串中。如果不能，则将其添加到列表中并启动一个新字符串。最后，您有一个结果列表。

正则表达式将message_txt拆分为160个字符

2 个答案: