我有以下python代码。但是,我在从结果中删除重复链接时遇到问题。
search_results_links = []
for i in range(len(search_results)):
if search_results[i]['href'] == "":
continue
elif (search_results[i]['href'][0] == "/"):
search_results_links.append("https://www.census.gov"+search_results[i]['href'])
elif (search_results[i]['href'][0] == "#") :
continue
elif (search_results[i]['href'][0] == "j") :
continue
else:
search_results_links.append(search_results[i]['href'])
# Remove duplicates.
search_results_links.sort()
search_results_links2 = []
for i in range(len(search_results_links)):
if search_results_links[i][:-1] == search_results_links[i - 1]:
continue
else:
search_results_links2.append(search_results_links[i])
如何更新此代码以仅提取唯一链接?
答案 0 :(得分:0)
不使用列表存储所有链接,而是使用集合。
考虑到代码中的其他所有内容都运行正常,如果您首先在集合中执行查找,然后将该链接附加到集合,则不需要删除重复项。像这样:
search_results_links = set()
for i in range(len(search_results)):
if search_results[i]['href'] == "":
continue
elif (search_results[i]['href'][0] == "/"):
if "https://www.census.gov"+search_results[i]['href'] not in search_results_links:
search_results_links.add("https://www.census.gov"+search_results[i]['href'])
elif (search_results[i]['href'][0] == "#") :
continue
elif (search_results[i]['href'][0] == "j") :
continue
else:
if search_results[i]['href'] not in search_results_links:
search_results_links.add(search_results[i]['href'])