我有一个csv文件,其中每条记录都是LinkedIn联系人。我必须重新创建另一个csv文件,其中每个联系人仅在特定日期之后到达(例如,在2017年1月1日之后连接到我的所有联系人)。 所以这是我的实现:
def import_from_csv(file):
key_order = ("FirstName","LastName","EmailAddress","Company","ConnectedOn")
linkedin_contacts = []
with open(file, encoding="utf8") as csvfile:
reader=csv.DictReader(csvfile, delimiter=',')
for row in reader:
single_person = {"FirstName": row["FirstName"], "LastName": row["LastName"],
"EmailAddress": row["EmailAddress"], "Company": row["Company"],
"ConnectedOn": parser.parse(row["ConnectedOn"])}
od = OrderedDict((k, single_person[k]) for k in key_order)
linkedin_contacts.append(od)
return linkedin_contacts
第一个脚本给了我一个有序的字典列表,我不知道我以前用来获得正确顺序的方式是好的,还看到一些例子(如here)我没有使用od .update方法,但我不认为我需要它,它是否正确?
现在我写了第二个函数来过滤列表:
def filter_by_date(connections):
filtered_list = []
target_date = parser.parse("01/04/2017")
for row in connections:
if row["ConnectedOn"] > target_date:
filtered_list.append(row)
return filtered_list
我这样做是否正确?
有没有办法优化代码?感谢
答案 0 :(得分:1)
对于过滤,您可以使用filter()
功能:
def filter_by_date(connections):
target_date = datetime.strptime("01/04/2017", '%Y/%m/%d').date()
return list(filter(lambda x: x["ConnectedOn"] > target_date, connections))
而不是创建简单的dict
,然后将其值填入OrderedDict
,您可以直接将值写入OrderedDict
:
for row in reader:
od = OrderedDict()
od["FirstName"] = row["FirstName"]
od["LastName"] = row["LastName"]
od["EmailAddress"] = row["EmailAddress"]
od["Company"] = row["Company"]
od["ConnectedOn"] = datetime.strptime(row["ConnectedOn"], '%Y/%m/%d').date()
linkedin_contacts.append(od)
如果您知道日期格式,则不需要python_dateutil
,您可以使用所需格式的内置datetime.datetime.strptime()
。
答案 1 :(得分:1)
因为您不准确格式字符串。
使用:
from datetime import datetime
format = '%d/%m/%Y'
date_text = '01/04/2017'
# inverse by datetime.strftime(format)
datetime.strptime(date_text, format)
#....
# with format as global
for row in reader:
od = OrderedDict()
od["FirstName"] = row["FirstName"]
od["LastName"] = row["LastName"]
od["EmailAddress"] = row["EmailAddress"]
od["Company"] = row["Company"]
od["ConnectedOn"] = strptime(row["ConnectedOn"], format)
linkedin_contacts.append(od)
做:
def filter_by_date(connections, date_text):
target_date = datetime.strptime(date_text, format)
return [x for x in connections if x["ConnectedOn"] > target_dat]
答案 2 :(得分:1)
第一点:您根本不需要OrderedDict
,只需要use a csv.DictWriter
来编写已过滤的csv。
fieldnames = ("FirstName","LastName","EmailAddress","Company","ConnectedOn")
with open("/apth/to/final.csv", "wb") as f:
writer = csv.DictWriter(f, fieldnames)
writer.writeheader()
writer.writerows(filtered_contacts)
第二点:你不需要从csv阅读器产生的新dict创建一个新的dict,只需更新ConnectedOn键:
def import_from_csv(file):
linkedin_contacts = []
with open(file, encoding="utf8") as csvfile:
reader=csv.DictReader(csvfile, delimiter=',')
for row in reader:
row["ConnectedOn"] = parser.parse(row["ConnectedOn"])
linkedin_contacts.append(row)
return linkedin_contacts
最后,如果你要做的就是获取源csv,过滤掉ConnectedOn
上的记录并写出结果,你不需要在内存中加载整个源,创建一个过滤list(再次在内存中)并写入已过滤的列表,您可以流式传输整个操作:
def filter_csv(source_path, dest_path, date):
fieldnames = ("FirstName","LastName","EmailAddress","Company","ConnectedOn")
target = parser.parse(date)
with open(source_path, "rb") as source, open(dest_path, "wb") as dest:
reader = csv.DictReader(source)
writer = csv.DictWriter(dest, fieldnames)
# if you want a header line with the fieldnames - else comment it out
writer.writeheaders()
for row in reader:
row_date = parser.parse(row["ConnectedOn"])
if row_date > target:
writer.writerow(row)
在这里,你很简单。
注意:我不知道" parser.parse()"但是,正如其他人提到的答案一样,您可能会更好地使用datetime
模块。