我有这段代码:
def merge_new_old_urls(urls_list, urls_file_path):
url_dict = {}
try:
with open(urls_file_path, "r") as f:
data = f.readlines()
for line in data:
#read what is already in file
url_dict = { line.split()[0]: int(line.split()[1])}
for new in urls_list:
for key in url_dict.keys():
if new == key:
print 'found'
url_dict[key] += 1
else:
url_dict[new] = 1
except IOError:
logging.critical('no files to read from %s' % urls_file_path)
raise IOError('no files to read from %s' % urls_file_path)
return url_dict
这应该从文件读取数据并将其与新的数据列表合并,计算重复的次数。带有旧网址的文件如下所示:
http://aaa.com 1
http://bbb.com 2
http://ccc.com 1
如果新的网址列表包含http://aaa.com http://bbb.com,则dict应为:
'http://aaa.com':2
'http://bbb.com':3
'http://ccc.com':1
但我的代码不正常。有人可以治愈吗?
答案 0 :(得分:2)
每次循环重新定义url_dict
:
url_dict = {line.split()[0]: int(line.split()[1])}
将条目添加到字典中:
for line in data:
key, val = line.split()
if key in url_dict:
url_dict[key] += val
else:
url_dict[key] = val
你完全没必要搜索字典,你可以使用与上面相同的语法:
for key in urls_list:
if key in url_dict:
url_dict[key] += val
else:
url_dict[key] = val
最后,你不应该在try
:
try:
with open(urls_file_path, "r") as f:
data = f.readlines()
except IOError:
logging.critical('no files to read from %s' % urls_file_path)
raise IOError('no files to read from %s' % urls_file_path)
else:
# rest of your code