我想对我的电子邮件进行统计分析。为此,我选择带有Outlook的有趣电子邮件,然后将其保存为txt文件。
以下是您可以找到的样本(或近似由于翻译而来):
Send: monday 9 jully 2018 12:00
To: john doe
Cc: sister doe; brother doe; mother doe
Object: my data issue
enclosed: data.pdf
Send: monday 9 jully 2018 12:00
To: john doe
Cc: sister doe; brother doe; mother doe
Object: my data issue
enclosed: data.pdf
Send: monday 9 jully 2018 12:00
To: john doe
Cc: sister doe; brother doe; mother doe
Object: my data issue
enclosed: data.pdf
很显然,要管理我的数据,最好将其分成几列。列标签为{发送,收件人,抄送,对象,附件},每封电子邮件都标有一行。
我敢肯定,存在一种很好的方法来做到这一点,也许是对熊猫来说,但是我没有使用好的关键字来找到有效的答案。
有什么提示可以帮助我吗?
答案 0 :(得分:0)
假设:
1)每个电子邮件信息集之间都有一个空行
2)在每个信息集中,您总是有5列(发送,到,抄送,对象,封闭),并且它们始终以相同的顺序出现
3)没有空数据(例如-所有电子邮件都有附件等)
input="""Send: monday 9 jully 2018 12:00
To: john doe
Cc: sister doe; brother doe; mother doe
Object: my data issue
enclosed: data.pdf
Send: monday 9 jully 2018 12:00
To: john doe
Cc: sister doe; brother doe; mother doe
Object: my data issue
enclosed: data.pdf
Send: monday 9 jully 2018 12:00
To: john doe
Cc: sister doe; brother doe; mother doe
Object: my data issue
enclosed: data.pdf"""
emails = input.split('\n\n')
output = list()
for email in emails:
lines = email.split('\n')
row=list()
for line in lines:
row.append(line.split(':')[1].strip())
output.append(row)
print(output)
output
将是列表的列表-在您的示例中为3行5列。以后可以根据需要相对轻松地将其转换为数据帧。