从文本文件中提取多个时间戳和用户名

时间:2012-03-19 15:40:33

标签: python

我有一个充满多封电子邮件的基本文本文件。我需要采用时间戳(MM / DD / YYYY HH:MM:SS)和随后的队名(每个块总是在同一个地方)并将它们放入一个新文件中,时间戳为改为YYYY-MM-DD。我知道删除字符串的基本输入和输出,但我不知道如何从文件中获取多个日期和团队名称,因为我不知道它们到底是什么(因此我正在寻找它们)

以下是电子邮件的内容:

To: address@address.com 
From: address@address.com 
Date: MM/DD/YYYY HH:MM:SS 
Subject: Welcome to the IQA!

Hi, and welcome to the IQA, TEAMNAME, blah blah blah

To: address@address.com 
From: address@address.com 
Date: MM/DD/YYYY HH:MM:SS 
Subject: Welcome to the IQA!

Hi, and welcome to the IQA, TEAMNAME, blah blah blah

这将重复超过100封电子邮件。如果有更容易写的语言,请告诉我!

2 个答案:

答案 0 :(得分:1)

假设每封邮件都以To:开头,我们可以拆分它们,只搜索条款。

import re # use regular expressions 

f = open("myEmails.txt")
mails = f.read()
f.close()

mails = mails.split("To:")
result = []

好吧,现在每个邮件都是我们列表mails中的字符串。让我们对正则表达式公然无知并假设

for mail in mails:
    # Let's use a regular expression that matches your date.
    # \d stands for any numeric character.
    date = re.findall("\d\d/\d\d/\d\d\d\d", mail)[0]
    # Use regular expression, or datetime object, or
    # just daft string concatenation to get new date:
    # the string[begin:end] syntax will give you a substring of string
    new_date = date[6:]+"-"+date[3:5]+"-"+date[:2]
    # We'll just find the first occurance of "IQA, " and assume
    # the teamname will follow after.
    teamname_start = mail.find('IQA, ')+5
    teamname = mail[teamname_start:mail.find(',', teamname_start)]
    result.append((new_date, teamname))

您最终会得到一个可以保存的元组列表:

f = open("output.txt", 'w')
for date, team in result:
    f.write("%s: team %s joined" %(date, team))
f.close()

答案 1 :(得分:0)

你需要通过正则表达式来获取它。

这是您一直在寻找的starting point