正则表达式获取原始RFC 822格式的电子邮件的正文

时间:2018-05-27 07:44:50

标签: regex pcre

import random
userKeywords = {"hi","hello","wassup","what'sup","greetings","sup","henlo","que onda","hola","hey","waddup"}

machineResponses = list({"hello", "Hello there, I am a bot", "greetings from inside this computer"})

def machineAnswer(message):
    if message in userKeywords:
        return random.choice(machineResponses)

def respondTo(message):
    print(machineAnswer(message))
respondTo("hello")

我只想尝试获取以下代码段:

Subject: test MIME-Version: 1.0 Content-Type: multipart/alternative; 
         boundary"----_Part_1631742_816935001.1527406760596" References: <414671049.1631743.1527406760597.ref@mail.yahoo.com>
X-Mailer: WebService/1.1.11897 YMailNorrin Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/66.0.3359.139 Safari/537.36 Content-Length: 416

------_Part_1631742_816935001.1527406760596 Content-Type: text/plain; charsetUTF-8 Content-Transfer-Encoding: 7bit

test
------_Part_1631742_816935001.1527406760596 Content-Type: text/html; charsetUTF-8 Content-Transfer-Encoding: 7bit

<html><head></head><body><div style"font-family:lucida console,
sans-serif;font-size:24px;"><div>test</div></div></body></html>
------_Part_1631742_816935001.1527406760596--

基本上是7bit和3之间的任何东西。

我已经尝试了以下正则数据没有成功: 正则表达式#1:Content-Type: text/plain; charsetUTF-8 Content-Transfer-Encoding: 7bit "test" ---

我以为我能够使用捕获群来获取内容,但我遇到了各种各样的问题。我正在使用MacOS终端和Content-Type: text/plain;(.*)(\n\n)(.*)---

1 个答案:

答案 0 :(得分:0)

在PCRE中,.不包含多行。您需要使用单行修饰符让.包含它们。尝试将(?s)添加到正则表达式中,如下所示:

(?s)Content-Type: text/plain;(.*)(\n\n)(.*)---

或者你可以使用:

Content-Type: text/plain; charsetUTF-8 Content-Transfer-Encoding: 7bit\s*([\s\S]*?)-{3}