Question

我有一个充满多封电子邮件的基本文本文件。我需要采用时间戳（MM / DD / YYYY HH：MM：SS）和随后的队名（每个块总是在同一个地方）并将它们放入一个新文件中，时间戳为改为YYYY-MM-DD。我知道删除字符串的基本输入和输出，但我不知道如何从文件中获取多个日期和团队名称，因为我不知道它们到底是什么（因此我正在寻找它们）

以下是电子邮件的内容：

To: address@address.com 
From: address@address.com 
Date: MM/DD/YYYY HH:MM:SS 
Subject: Welcome to the IQA!

Hi, and welcome to the IQA, TEAMNAME, blah blah blah

To: address@address.com 
From: address@address.com 
Date: MM/DD/YYYY HH:MM:SS 
Subject: Welcome to the IQA!

Hi, and welcome to the IQA, TEAMNAME, blah blah blah

这将重复超过100封电子邮件。如果有更容易写的语言，请告诉我！

Answer 1

假设每封邮件都以To:开头，我们可以拆分它们，只搜索条款。

import re # use regular expressions 

f = open("myEmails.txt")
mails = f.read()
f.close()

mails = mails.split("To:")
result = []

好吧，现在每个邮件都是我们列表mails中的字符串。让我们对正则表达式公然无知并假设

for mail in mails:
    # Let's use a regular expression that matches your date.
    # \d stands for any numeric character.
    date = re.findall("\d\d/\d\d/\d\d\d\d", mail)[0]
    # Use regular expression, or datetime object, or
    # just daft string concatenation to get new date:
    # the string[begin:end] syntax will give you a substring of string
    new_date = date[6:]+"-"+date[3:5]+"-"+date[:2]
    # We'll just find the first occurance of "IQA, " and assume
    # the teamname will follow after.
    teamname_start = mail.find('IQA, ')+5
    teamname = mail[teamname_start:mail.find(',', teamname_start)]
    result.append((new_date, teamname))

您最终会得到一个可以保存的元组列表：

f = open("output.txt", 'w')
for date, team in result:
    f.write("%s: team %s joined" %(date, team))
f.close()

Answer 2

你需要通过正则表达式来获取它。

这是您一直在寻找的starting point。

从文本文件中提取多个时间戳和用户名

2 个答案: