按照这个问题问 How to split Text into paragraphs using NLTK nltk.tokenize.texttiling?
我想执行将电子邮件拆分为不同组件的相同过程
From: X
To: Y (LOGISTICS)
Date: 10/03/2017
Hello team, (INTRO)
Some text here representing
the body (BODY1)
of the text.
Some text here representing
the body (BODY2)
of the text.
Some text here representing
the body (BODY3)
of the text.
Regards, (OUTRO)
X
*****DISCLAIMER***** (POST EMAIL DISCLAIMER)
THIS EMAIL IS CONFIDENTIAL
IF YOU ARE NOT THE INTENDED RECIPIENT PLEASE DELETE THIS EMAIL
就我而言,邮件本身的(BODY)区域中将有多个块/段落。
我需要去除顶部和底部的所有杂音,并仅接收邮件的(主体)部分。