找到正则表达式并附加它

时间:2017-02-05 15:11:50

标签: python csv

我正在尝试比较2个CSV文件以找出差异。这两个文件看起来像这样:

['John'], ['Johnson'], ['1337@john-johnson.pro']
['Steve'], ['Stevens'], ['s.stevens@company.com']
['Sarah'], ['Stevens'], ['sarah.stevens@company.com']

['John'], ['Johnson'], ['1337@john-johnson.pro']
...
['Richard'], ['McBait'], ['ilovecats123@mail.mcbait.com']

我要做的是比较这两个文件而不必创建临时文件。该脚本应该能够排除字符['],读取值,然后将2个文件相互比较,代表“新用户”。

我使用这个(可能是错误的)逻辑来解决这个问题:

read the file -> execute subprocess (tr -d \[\]\') -> save output to file1_temp -> read the file1_temp -> convert to set -> compare (.difference) with file2_tmp

所以,问题是,有没有更快的方法来解决这个问题?例如,在Perl中,通过使用if line正则表达式来确定将读取哪些数据。

1 个答案:

答案 0 :(得分:0)

假设

['John'], ['Johnson'], ['1337@john-johnson.pro']

['John'], ['Johnson'], ['1337@john-johnson.pro']

与您的情况不同,将每个csv文件加载到列表中(在内存中)并获取这两个列表的增量(使用set)。

file1.csv:

['John'], ['Johnson'], ['1337@john-johnson.pro']
['Steve'], ['Stevens'], ['s.stevens@company.com']
['Sarah'], ['Stevens'], ['sarah.stevens@company.com']

file2.csv:

['John'], ['Johnson'], ['1337@john-johnson.pro']
['Steve'], ['Stevens'], ['s.stevens@company.com']
['Richard'], ['McBait'], ['ilovecats123@mail.mcbait.com']

以下是代码:

>>> import csv
>>> with open('file1.csv') as f:
...     reader = csv.reader(f)
...     list1 = map(tuple, reader)
...
>>> with open('file2.csv') as f:
...     reader = csv.reader(f)
...     list2 = map(tuple, reader)
...
>>> delta = list(set(list2) - set(list1))
>>> print delta
[("['Sarah']", " ['Stevens']", " ['sarah.stevens@company.com']")]
>>> clean_delta = [tuple(x.strip().strip('[\'\']') for x in y) for y in delta]
>>> print clean_delta
[('Sarah', 'Stevens', 'sarah.stevens@company.com')]