检查文件中第1列和第2列的2行是否具有相同的值,如果没有,则将该行添加到另一行'输出'文件,如果它们相同,那么基于第三列(时间戳),最新的文件被添加到'输出'文件。
下面的代码段会比较整行,而不是列,我该如何对列进行比较?
#!/usr/bin/python
import os,sys,csv
file_open= sys.argv[1]
with open (file_open,'r') as f1, open ('output.txt','w+') as f2:
lines2 = f2.readlines()
for line in f1:
if line not in lines2:
f2.write(line)
1,A,28/04/17 10:57:28.096
3,A,28/04/17 10:57:46.950
1,A,28/04/17 10:59:16.969
3,A,28/04/17 11:02:09.341
4,A,28/04/17 11:03:09.432
1,A,28/04/17 10:59:16.969
3,A,28/04/17 11:02:09.341
4,A,28/04/17 11:03:09.432
答案 0 :(得分:0)
由于您要导入csv
模块,我建议您使用它。
import sys
import csv
seen = set()
file_open = sys.argv[1]
with open(file_open, 'r') as f1, open('output.txt','w') as f2:
reader = csv.reader(f1)
writer = csv.writer(f2)
for line in reader:
if not len(line): # a quick check to make sure it's a valid line
continue
if (line[0], line[1]) not in seen:
seen.add((line[0], line[1]))
writer.writerow(line)
此代码检查以确保在写入之前已经看不到具有相同第一列和第二列的行。元组是可以清洗的,所以这很容易做到。
输出:
1,A,28/04/17 10:57:28.096
3,A,28/04/17 10:57:46.950
4,A,28/04/17 11:03:09.432
答案 1 :(得分:0)
@Coldspeed's code的修改版本,使用OrderedDict
按时间戳保留最新的条目(假设时间戳按顺序排列)。
import sys
import csv
from collections import OrderedDict
history = OrderedDict()
file_open = sys.argv[1]
with open(file_open, 'r') as f1, open('output.txt','w') as f2:
reader = csv.reader(f1)
writer = csv.writer(f2)
for line in reader:
if not len(line): # valid line check
continue
history[(line[0], line[1])] = line[2] # Adds if present, updates if new
for line in list(history.items()):
writer.writerow([line[0][0], line[0][1], line[1]])
output.txt
的内容:
1,A,28/04/17 10:59:16.969
3,A,28/04/17 11:02:09.341
4,A,28/04/17 11:03:09.432
答案 2 :(得分:0)
使用itertools.groupby()
函数和datetime
模块的简短解决方案(比较 date 字符串):
import sys, csv, itertools, datetime, operator
with open(sys.argv[1], 'r') as in_csv, open('output.csv', 'w') as out_csv:
reader = csv.reader(in_csv)
lines = [ max(g, key=lambda x: datetime.datetime.strptime(x[2], '%d/%m/%y %H:%M:%S.%f'))
for k,g in itertools.groupby(sorted(reader, key=lambda r: (r[0], r[1])), key=operator.itemgetter(0,1))]
writer = csv.writer(out_csv, lineterminator='\n')
for l in lines:
writer.writerow(l)
output.csv 内容:
1,A,28/04/17 10:59:16.969
3,A,28/04/17 11:02:09.341
4,A,28/04/17 11:03:09.432