正则表达式将message_txt拆分为160个字符

时间:2015-01-09 22:26:38

标签: python regex

我正在尝试将消息系统的消息文本拆分为最多160个字符长的序列,以空格结尾,除非它是最后一个序列,然后只要它等于或小于它就可以结束超过160个字符。

这个重复表达'。{1,160} \ s'几乎可以工作,但它会删除一条消息的最后一个字,因为通常消息的最后一个字符不是空格。

我也试过'。{1,160} \ s |。{1,160}'但是这不起作用,因为最后的序列只是最后一个空格之后的剩余文本。有没有人知道如何做到这一点?

实施例

two_cities = ("It was the best of times, it was the worst of times, it was " +
         "the age of wisdom, it was the age of foolishness, it was the " +
         "epoch of belief, it was the epoch of incredulity, it was the " +
         "season of Light, it was the season of Darkness, it was the " +
         "spring of hope, it was the winter of despair, we had " +
         "everything before us, we had nothing before us, we were all " +
         "going direct to Heaven, we were all going direct the other " +
         "way-- in short, the period was so far like the present period," +
         " that some of its noisiest authorities insisted on its being " +
         "received, for good or for evil, in the superlative degree of " +
         "comparison only.")


chunks = re.findall('.{1,160}\s|.{1,160}', two_cities)
print(chunks)

将返回

['这是最好的时期,这是最糟糕的时期,它是智慧的时代,它是愚蠢的时代,它是信仰的时代,它是'的时代', “不可思议,这是光明的季节,是黑暗的季节,是希望的春天,是绝望的冬天,我们面前的一切,我们,” “在我们面前没有任何东西,我们都直接走向天堂,我们都是直接走向另一条道路 - 简而言之,这段时期就像现在一样,”, “它的一些最嘈杂的当局坚持要求它在最高级别的比较中被接受,无论是善还是恶,” '仅。']

列表的最后一个元素应该是

'一些最吵闹的当局坚持要求它在最高级别的比较中被接受,无论是好是坏。'

不是'仅'。

2 个答案:

答案 0 :(得分:1)

试试这个 - .{1,160}(?:(?<=[ ])|$)

 .{1,160}                      # 1 - 160 chars
 (?:
      (?<= [ ] )                    # Lookbehind, must end with a space
   |  $                             # or, be at End of String
 )

信息 -

默认情况下,引擎会尝试匹配160个字符(贪婪) 然后它检查表达式的下一部分。

后方强制.{1,160}匹配的最后一个字符是空格。
或者,如果在字符串的末尾,则不执行。

如果lookbehind失败,而不是在字符串的末尾,引擎将回溯到159个字符,然后再次检查。这一过程重复直到断言通过。

答案 1 :(得分:0)

您应该避免使用正则表达式,因为它们可能效率低下。

我会推荐这样的东西:(see it in action here

list = []
words = two_cities.split(" ")

for i in range(0, len(words)):
    str = []
    while i < len(words) and len(str) + len(words[i]) <= 160:
        str.append(words[i] + " ")
        i += 1
    list.append(''.join(str))

print list

这会创建一个包含空格的所有单词的列表。

如果单词适合字符串,它会将它添加到字符串中。如果不能,则将其添加到列表中并启动一个新字符串。最后,您有一个结果列表。