我想以{Subject:Body}格式从电子邮件存档(.txt文件)中提取主题和电子邮件正文。以下是我的txt文件
testing.txt
To: samplemail
From: ssample Sender
Subject: This is the sample request one...
Hey there,
This is the smaple email just for the test purpose.
No intentions to hurt somebodys feleings at all.
Thanks,
To: sampleSender2
From: ssampleReciever2
Subject: This is the sample request second...
Hey there,
this is another sample mail body and just to test the py script working
this si the part of the data preprocesing
thanks
这是我的python文件
test.py
txt = "testing.txt"
file = open(txt)
body = ""
body_list = list()
subject_list = list()
for line in file:
line = line.rstrip()
if line.startswith("From:") or line.startswith("To:"):
continue
if line.startswith("Subject:"):
subject_list.append(line)
if not line.startswith("Subject:"):
body = body + line
请帮我弄清楚逻辑。
答案 0 :(得分:1)
预期的输出格式{Subject: Body}
在我看来就像是字典,因此,我建议您坚持使用字典作为容器。以下内容将跳过以“ To:”,“ From:”或“ \ n”开头的任何行。如果遇到主题行,它将在该主题行的字典中生成一个条目,并将后续行的连接(直到下一个主题行)添加为当前主题行的值。
with open("testing.txt") as f:
data = {}
for line in f:
if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
continue
if line.startswith("Subject:"):
current_subject = line.split(":")[-1].strip()
else:
data.setdefault(current_subject, "")
data[current_subject] += line
print(data)
# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}
请随意从行中strip
除去不需要的字符。
我希望这会有所帮助。
答案 1 :(得分:0)
第一个版本是该版本,但如果主题与第二部分相同,则可以将其放入词典中
subjects =[]
bodys = []
with open("test.txt") as file:
body = ""
for line in file:
if line.startswith("From:") or line.startswith("To:"):
continue
if line.startswith("Subject:"):
if body != '':
bodys.append(body)
body = ""
subjects.append(line.split("Subject:")[1])
if not line.startswith("Subject:"):
body +=line
bodys.append(body) #appends the last body of the mail
body = ""
print(subjects)
print(bodys)
SB ={}
with open("test.txt") as file:
body = ""
subject = ""
for line in file:
if line.startswith("From:") or line.startswith("To:"):
continue
if line.startswith("Subject:"):
if body != '':
SB[subject] = body
body = ""
subject = line.split("Subject:")[1]
SB[subject]=''
if not line.startswith("Subject:"):
body +=line
SB[subject] = body
body = ""
print(SB)
答案 2 :(得分:0)
email_data, subject, body = {}, "", ""
with open("emails.txt", "r") as records:
for record in records:
if record.startswith("Subject:"):
subject = record.split("Subject:")[1].strip()
elif not record.startswith("To:") and not record.startswith("From:"):
body += record
else:
subject, body = "", ""
continue
email_data[subject] = body
print(email_data)