NLTK TextTiling:优化令牌生成器以创建revevant段落

时间:2018-10-10 05:54:29

标签: python python-3.x nlp data-science text-analysis

按照这个问题问 How to split Text into paragraphs using NLTK nltk.tokenize.texttiling?

我想执行将电子邮件拆分为不同组件的相同过程

From: X
To: Y                             (LOGISTICS)
Date: 10/03/2017

Hello team,                       (INTRO)

Some text here representing
the body                          (BODY1)
of the text.

Some text here representing
the body                          (BODY2)
of the text.

Some text here representing
the body                          (BODY3)
of the text.

Regards,                          (OUTRO)
X

*****DISCLAIMER*****              (POST EMAIL DISCLAIMER)
THIS EMAIL IS CONFIDENTIAL
IF YOU ARE NOT THE INTENDED RECIPIENT PLEASE DELETE THIS EMAIL

就我而言,邮件本身的(BODY)区域中将有多个块/段落。

我需要去除顶部和底部的所有杂音,并仅接收邮件的(主体)部分。

0 个答案:

没有答案