我是Python新手,想比较两个文本文件。该 第一个文本文件具有以下格式:
Date: 11/30/2010
Time: 21:04:10
From: John
To: Ed
Protocol: SMTP
Date: 11/30/2010
Time: 15:14:19
From: Fred
To: John
Protocol: SMTP
Date: 08/15/2010
Time: 09:11:12
From: Sue
To: Tom
Protocol: POP
.
.
.
第二个文件具有以下格式:
Data:3 xxxx 2010-08-15 09:11:12
Type IV send now
Sue -> Tom
Protocol: SMTP
Data:23 xxxx 2010-07-15 09:11:12
Type V send now
Fred -> ED
Protocol: FTP
SMTP
Data:45 xxxx 2010-06-15 09:11:12
Type IV send now
Fred -> Sue
Protocol: POP
SMTP
.
.
.
我需要从文件1中读取一个名称/值对的块,并使用文件1中提供的“Date,Time,From,To”查找文件2中匹配的所有块。例如,文件中的唯一匹配将介于:
之间Date: 08/15/2010
Time: 09:11:12
From: Sue
To: Tom
Protocol: POP
和
Data:45 xxxx 2010-06-15 09:11:12
Type IV send now
Fred -> Sue
Protocol: POP
SMTP
我已经启动了以下代码,但我仍坚持如何最好地执行此操作 解析和比较。基于我使用过的其他语言,我会将第1行的五个名称/值对放入一个数据类型,然后我可以引用它来查找第二个文件中的匹配项。任何帮助都是最重要的 赞赏。感谢。
import os
def main():
#read file
file1 = open(os.path.expanduser("~/Documents/file1.txt"),"r")
lines = file1.readlines()
#parse
for line in lines:
line = line.strip()
#print(line)
if line == "Date:":
print(line)
file1.close()
main()
答案 0 :(得分:1)
res
是最初为空列表的列表。当您浏览文件行的每一行时,如果该行包含文件,则convert the fate format to match date format in file2
。
检查文件2中是否存在date
。如果是,则获取块索引,从哪里到哪里(假设每个块用换行符分隔)并附加追加file1块和file2块的内容作为res
的列表!
迭代找到的列表。取决于file2的内容!此方案未在下面处理。
import os
from datetime import datetime
def main():
#read file
file1 = open("file1.txt","r")
lines = file1.read().splitlines()
file2 = open("file2.txt","r")
cmplines = file2.read().splitlines()
res=[]
for ind,line in enumerate(lines):
line = line.strip()
if "Date:" in line:
l,date=line.split(":")
date=date.strip()
date = datetime.strptime(date, '%m/%d/%Y')
date = date.strftime("%Y-%m-%d")
found = [i for i,l in enumerate(cmplines) if date in l] # check for date in file2
if found:
end = lines[ind:].index("") if "" in lines[ind:] else len(lines) #get the block end index for file1
end2 = cmplines[found[0]:].index("") if "" in cmplines[found[0]:] else len(cmplines)#get the block end index for file2
res.append([lines[ind:end],cmplines[found[0]:end2]])
for file1_content,file2_content in res:
print file1_content,file2_content
print "\n"
file1.close()
file2.close()
main()
希望它有所帮助!
答案 1 :(得分:0)
好的,这需要一点点,但我有一个很好的测试版供你玩。我不太确定你想要如何构建输出,但它几乎就在那里。
import datetime
import re
file1 = open("file1.txt", mode="r", newline="\n")
file2 = open("file2.txt", mode="r", newline="\n")
fixdate = re.compile("[0-9].*?\s[a-z]\D*")
lines2 = [lines.strip() for lines in file2.readlines()[:-3]]
for i in range(len(lines2)):
line = re.sub(fixdate, "", lines2[i])
dict2 = dict()
if "Protocol" in line:
li = line.split(" ")
li.append(lines2[i+1].lstrip().strip())
for l in li:
proto = re.search("(?<=Protocol:) %s" % l, str(file1.read()))
if proto is None:
pass
else:
print("Matched Protocol: %s" % l)
elif "Data" in line:
li = re.sub("Data:", "", line)
dt = li.split(" ")
d = datetime.datetime.strptime(dt[0], "%Y-%m-%d").strftime("%m/%d/%Y")
date = re.search('(?<=Date:) %s' % d, str(file1.read()))
times = re.search('(?<=Time:) %s' % dt[1], str(file1.read()))
if date is not None:
print("Matched Date %s" % d)
else:
pass
if times is not None:
print("Matched Time: %s" % dt)
else:
pass
elif "->" in line:
li = line.split("->")
sender = re.search('(?<=To:) %s' % li[0], str(file1.read()))
if sender is None:
pass
else:
print("Matched Sender: %s" % li[0])
recv = re.search('(?<=From:) %s' % li[1], str(file1.read()))
if recv is None:
pass
else:
print("Matched Receiver: %s" % li[1])