比较具有给定2个列名称的文本文件作为输入

时间:2018-07-09 11:10:48

标签: python python-2.7

我有2个带有随机列列表的文本文件。

文件1:file1.txt

ID|Name|Number|Date
1|John|991122|23-12-2017
2|Smith|889911|24-12-2017
3|Mak|776532|25-12-2107

文件2:file2.txt

Number|ID|Date|Name
991122|1|23-Dec-2017|John
889911|2|24-Dec-2017|Smith
776532|3|25-Dec-2017|Mak
987654|4|26-Dec-2017|Joseph
765551|5|27-Dec-2017|William

我想基于指定的2列在file1和file2之间进行比较,并希望将file2.txt的结果作为.txt存储到输出文件中。

预期的输出文件:output.txt基于指定的列IDDate

Number|ID|Date|Name
987654|4|26-Dec-2017|Joseph
765551|5|27-Dec-2017|William

注意:第Date列在任何文件中都可能具有不同的(未知)格式。

尝试:

file1 = 'E:\Python\File Comparison Files\File1.txt' 
file2 = 'E:\Python\File Comparison Files\File2.txt' 
file3 = 'E:\Python\File Comparison Files\outputfile.txt' 

with open(file1) as b:
    first_line_b = b.readline()
    print 'File1 Columns:', first_line_b

file1Column1 = raw_input('Enter File1 column1 name to compare:')
file1Column2 = raw_input('Enter File1 column2 name to compare:')


with open(file2) as a:
    first_line_a = a.readline()
    print '\nFile2 Columns:', first_line_a

file2Column1 = raw_input('Enter File2 column1 name to compare:')
file2Column2 = raw_input('Enter File2 column2 name to compare:')

#Following will do all data comparison, but not specified column
with open(file1) as b:
    blines = set(b)
with open(file2) as a:
    first_line = a.readline()
    with open(file3, 'w') as result:
        result.write(first_line)
        for line in a:
            if line not in blines:
                result.write(line)

上面的代码将比较完整的数据,但不比较指定的列/字段。我想根据每个文件中传递的两列进行比较,然后将结果存储在第三个文件中。

1 个答案:

答案 0 :(得分:0)

您可以使用csv.DictReadercsv.DictWriter有效地做到这一点

import csv

file1, file2, file3 = 'file1.txt', 'file2.txt', 'file3.txt'
col_name = raw_input('Enter File1 column1 name to compare:')

uids = set()
with open(file1) as fo1:
    for row in csv.DictReader(fo1, delimiter='|'):
         udis.add(row[col_name])

with open(file2) as fo2:
    with open(file3, 'w') as fo3:
        reader = csv.DictReader(fo2, delimiter='|')
        writer = csv.DictWriter(fo3, delimiter='|', fieldnames=reader.fieldnames)
        writer.writeheader()
        for row in reader:
            if row[col_name] in uids:
                continue
            writer.writerow(row)

file3.txt现在应包含

Number|ID|Date|Name
987654|4|26-Dec-2017|Joseph
765551|5|27-Dec-2017|William