python:用于包装通过电子邮件约定引用的文本的模块?

时间:2017-02-11 22:31:11

标签: python email word-wrap quote

我正在寻找一个python模块或一些现有的python代码,可用于包装使用">"行前缀表示引用的文本(参见下面的示例)。

我知道我可以使用python textwrap 模块来包装文本段落。但是,该模块并不了解这种引用前缀。

我知道如何编写将执行此文本包装的例程,并且我不寻求有关如何编写它的建议。相反,我想知道是否有人知道已经存在的任何python代码或python模块,并且已经能够在电子邮件类型的引用文本上执行这种包装。

我一直在寻找,但我在python中找不到任何东西。

我只是不想重新发明轮子"如果这样的事情已经写好了。

以下是我想要执行的文字换行的示例。假设我有以下来自假设电子邮件的文本:

Abc defg hijk lmnop.

Mary had a little lamb.
Her fleas were white as snow,

> Now is the time for all good men to come to the aid of their party.
>
> The quick
> brown fox jumped over the lazy sleeping dog.

>> When in the Course of human
>> events it
>> becomes necessary for one people to dissolve the political
>> bands
>> which have
>> connected them ...
      and everywhere that Mary went,
      her fleas were sure to go
      ... and to reproduce.
> What do you mean by this?
>> with another
>> and to assume among
>> the powers of the earth ...
> Doo wah diddy, diddy dum, diddy doo.
>> Text text text text text text text text text text text text text text text text text text text text text text text text text text text.

假设我想在第52列换行,结果文本应如下所示:

Abc defg hijk lmnop.

Mary had a little lamb. Her fleas were white as
snow,

> Now is the time for all good men to come to the
> aid of their party.
>
> The quick brown fox jumped over the lazy sleeping
> dog.

>> When in the Course of human events it becomes
>> necessary for one people to dissolve the
>> political bands which have connected them ...
      and everywhere that Mary went, her fleas were
      sure to go ... and to reproduce.
> What do you mean by this?
>> with another and to assume among the powers of
>> the earth ...
> Doo wah diddy, diddy dum, diddy doo.
>> Text text text text text text text text text text
>> text text text text text text text text text text
>> text text text text text text text.

感谢您对现有python代码的任何引用。

如果不存在这样的事情并且#34;在野外",我会写下这个并在此处发布我的代码。

非常感谢。

1 个答案:

答案 0 :(得分:0)

我无法找到包含此类引用文本的现有代码,所以这里是我编写的代码。它使用 re textwrap 模块。

我将代码分解为"段落"基于初始引号或缩进字符的数量。然后我使用 textwrap 来打包每个"段落#34;从每行中删除引用或缩进前缀。在换行之后,我将前缀重新添加到"段落"。

的每一行

有一天,我会清理代码并使其更加优雅,但至少它似乎工作得很好。

import re
import textwrap
def wrapemail(text, wrap=72):
    if not text:
        return ''
    prefix      = None
    prev_prefix = None
    paragraph   = []
    paragraphs  = []
    for line in text.rstrip().split('\n'):
        line = line.rstrip()
        m = wrapemail.qprefixpat.search(line)
        if m:
            prefix = wrapemail.whitepat.sub('', m.group(1))
            text   = m.group(2)
            if text and wrapemail.whitepat.search(text[0]):
                prefix += text[0]
                text    = text[1:]
        else:
            m = wrapemail.wprefixpat.search(line)
            if m:
                prefix = m.group(1)
                text   = m.group(2)
            else:
                prefix = ''
                text   = line
        if not text:
            if paragraph and prev_prefix is not None:
                paragraphs.append((prev_prefix, paragraph))
            paragraphs.append((prefix, ['']))
            prev_prefix = None
            paragraph   = []
        elif prefix != prev_prefix:
            if paragraph and prev_prefix is not None:
                paragraphs.append((prev_prefix, paragraph))
            prev_prefix = prefix
            paragraph   = []
        paragraph.append(text)
    if paragraph and prefix is not None:
        paragraphs.append((prefix, paragraph))
    result = ''
    for paragraph in paragraphs:
        prefix = paragraph[0]
        text   = '\n'.join(paragraph[1]).rstrip()
        wraplen = wrap - len(prefix)
        if wraplen < 1:
            result += '{}{}\n'.format(prefix, text)
        elif text:
            for line in textwrap.wrap(text, wraplen):
                result += '{}{}\n'.format(prefix, line.rstrip())
        else:
            result += '{}\n'.format(prefix)
    return result
wrapemail.qprefixpat = re.compile(r'^([\s>]*>)([^>]*)$')
wrapemail.wprefixpat = re.compile(r'^(\s+)(\S.*)?$')
wrapemail.whitepat   = re.compile(r'\s')

将原始邮件中的文字提供给它,并使用&#39; wrap&#39;指定为52确实会产生我在上面指定的输出。

随意改进或窃取它。 :)