使用Python将电子邮件的主题和正文提取到字典中

时间:2018-06-27 11:07:12

标签: python

我想以{Subject:Body}格式从电子邮件存档(.txt文件)中提取主题和电子邮件正文。以下是我的txt文件

testing.txt

To: samplemail
From: ssample Sender
Subject: This is the sample request one...
Hey there, 
This is the smaple email just for the test purpose.
No intentions to hurt somebodys feleings at all.
Thanks, 


To: sampleSender2
From: ssampleReciever2
Subject: This is the sample request second...
Hey there, 
this is another sample mail body and just to test the py script working
this si the part of the data preprocesing 
thanks

这是我的python文件

test.py

txt = "testing.txt"
file = open(txt)
body = ""
body_list = list()
subject_list = list()
for line in file:
    line = line.rstrip()
    if line.startswith("From:") or line.startswith("To:"):
        continue
    if line.startswith("Subject:"):
        subject_list.append(line)
    if not line.startswith("Subject:"):
        body = body + line

请帮我弄清楚逻辑。

3 个答案:

答案 0 :(得分:1)

预期的输出格式{Subject: Body}在我看来就像是字典,因此,我建议您坚持使用字典作为容器。以下内容将跳过以“ To:”,“ From:”或“ \ n”开头的任何行。如果遇到主题行,它将在该主题行的字典中生成一个条目,并将后续行的连接(直到下一个主题行)添加为当前主题行的值。

with open("testing.txt") as f:
    data = {}
    for line in f:
        if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
            continue
        if line.startswith("Subject:"):
            current_subject = line.split(":")[-1].strip()
        else:
            data.setdefault(current_subject, "")
            data[current_subject] += line

print(data)

# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}

请随意从行中strip除去不需要的字符。

我希望这会有所帮助。

答案 1 :(得分:0)

第一个版本是该版本,但如果主题与第二部分相同,则可以将其放入词典中

PART 1

  subjects =[]
    bodys = []
    with open("test.txt") as file:
        body = ""
        for line in file:
            if line.startswith("From:") or line.startswith("To:"):
                continue
            if line.startswith("Subject:"):
               if body != '':
                   bodys.append(body)
                   body = ""
               subjects.append(line.split("Subject:")[1])
            if not line.startswith("Subject:"):
                body +=line
        bodys.append(body) #appends the last body of the mail
        body = ""
    print(subjects)
    print(bodys)

PART 2

    SB ={}
    with open("test.txt") as file:
        body = ""
        subject = ""
        for line in file:
            if line.startswith("From:") or line.startswith("To:"):
                continue
            if line.startswith("Subject:"):
               if body != '':
                   SB[subject] = body
                   body = ""
               subject = line.split("Subject:")[1]
               SB[subject]=''
            if not line.startswith("Subject:"):
                body +=line
        SB[subject] = body
        body = ""

    print(SB)

答案 2 :(得分:0)

email_data, subject, body = {}, "", ""
with open("emails.txt", "r") as records:
    for record in records:
        if record.startswith("Subject:"):
            subject = record.split("Subject:")[1].strip()
        elif not record.startswith("To:") and not record.startswith("From:"):
            body += record
        else:
            subject, body = "", ""
            continue
        email_data[subject] = body
print(email_data)