在Python中比较两个非常不同的文本文件

时间:2017-03-12 05:53:15

标签: python

我是Python新手,想比较两个文本文件。该 第一个文本文件具有以下格式:

Date: 11/30/2010
Time: 21:04:10
From: John
To: Ed
Protocol: SMTP

Date: 11/30/2010
Time: 15:14:19
From: Fred
To: John
Protocol: SMTP

Date: 08/15/2010
Time: 09:11:12
From: Sue
To: Tom
Protocol: POP
.
.
.

第二个文件具有以下格式:

Data:3 xxxx 2010-08-15 09:11:12
Type IV send now
Sue -> Tom
Protocol: SMTP

Data:23 xxxx 2010-07-15 09:11:12
Type V send now
Fred -> ED
Protocol: FTP
          SMTP

Data:45 xxxx 2010-06-15 09:11:12
Type IV send now
Fred -> Sue
Protocol: POP
          SMTP
.
.
.

我需要从文件1中读取一个名称/值对的块,并使用文件1中提供的“Date,Time,From,To”查找文件2中匹配的所有块。例如,文件中的唯一匹配将介于:

之间
Date: 08/15/2010
Time: 09:11:12
From: Sue
To: Tom
Protocol: POP

Data:45 xxxx 2010-06-15 09:11:12
Type IV send now
Fred -> Sue
Protocol: POP
          SMTP

我已经启动了以下代码,但我仍坚持如何最好地执行此操作 解析和比较。基于我使用过的其他语言,我会将第1行的五个名称/值对放入一个数据类型,然后我可以引用它来查找第二个文件中的匹配项。任何帮助都是最重要的 赞赏。感谢。

import os

def main():
    #read file
    file1 = open(os.path.expanduser("~/Documents/file1.txt"),"r")
    lines = file1.readlines()

#parse
for line in lines:
    line = line.strip()
    #print(line)
    if line == "Date:":
        print(line)
        file1.close()

main()

2 个答案:

答案 0 :(得分:1)

res是最初为空列表的列表。当您浏览文件行的每一行时,如果该行包含文件,则convert the fate format to match date format in file2

检查文件2中是否存在date。如果是,则获取块索引,从哪里到哪里(假设每个块用换行符分隔)并附加追加file1块和file2块的内容作为res的列表!

如果file2中存在多个匹配项,则

迭代找到的列表。取决于file2的内容!此方案未在下面处理。

import os
from datetime import datetime


def main():
    #read file
    file1 = open("file1.txt","r")
    lines = file1.read().splitlines()
    file2 = open("file2.txt","r")
    cmplines = file2.read().splitlines()
    res=[]
    for ind,line in enumerate(lines):
        line = line.strip()
        if "Date:" in line:
            l,date=line.split(":")
            date=date.strip()
            date = datetime.strptime(date, '%m/%d/%Y')
            date = date.strftime("%Y-%m-%d")
            found = [i for i,l in enumerate(cmplines) if date in l] # check for date in file2
            if found:
                end = lines[ind:].index("") if "" in lines[ind:] else len(lines) #get the block end index for file1
                end2 = cmplines[found[0]:].index("") if "" in cmplines[found[0]:] else len(cmplines)#get the block end index for file2
                res.append([lines[ind:end],cmplines[found[0]:end2]])

    for file1_content,file2_content in res:
        print file1_content,file2_content
        print "\n"
    file1.close()
    file2.close()

main()

希望它有所帮助!

答案 1 :(得分:0)

好的,这需要一点点,但我有一个很好的测试版供你玩。我不太确定你想要如何构建输出,但它几乎就在那里。

import datetime
import re


file1 = open("file1.txt", mode="r", newline="\n")
file2 = open("file2.txt", mode="r", newline="\n")

fixdate = re.compile("[0-9].*?\s[a-z]\D*")

lines2 = [lines.strip() for lines in file2.readlines()[:-3]]


for i in range(len(lines2)):
    line = re.sub(fixdate, "", lines2[i])
    dict2 = dict()
    if "Protocol" in line:
        li = line.split(" ")
        li.append(lines2[i+1].lstrip().strip())
        for l in li:
            proto = re.search("(?<=Protocol:) %s" % l, str(file1.read()))
            if proto is None:
                pass
            else:
                print("Matched Protocol: %s" % l)

elif "Data" in line:
    li = re.sub("Data:", "", line)
    dt = li.split(" ")
    d = datetime.datetime.strptime(dt[0], "%Y-%m-%d").strftime("%m/%d/%Y")
    date = re.search('(?<=Date:) %s' % d, str(file1.read()))
    times = re.search('(?<=Time:) %s' % dt[1], str(file1.read()))
    if date is not None:
        print("Matched Date %s" % d)
    else:
        pass
    if times is not None:
        print("Matched Time: %s" % dt)
    else:
        pass

elif "->" in line:
    li = line.split("->")

    sender = re.search('(?<=To:) %s' % li[0], str(file1.read()))
    if sender is None:
        pass
    else:
        print("Matched Sender: %s" % li[0])
    recv = re.search('(?<=From:) %s' % li[1], str(file1.read()))
    if recv is None:
        pass
    else:
        print("Matched Receiver: %s" % li[1])