我正在编写以下代码并面临一个令人沮丧的问题,而且在被困两天之后我无法解决它。
这是简化的代码:
def crawl_web(url, depth):
toCrawl = [url]
crawled = ['https://index.html']
i = 0
while i <= depth:
interim = []
for x in toCrawl:
if x not in toCrawl and x not in crawled and x not in interim:
print("NOT IN")
crawled.append(x)
toCrawl = interim
i += 1
return crawled
print(crawl_web("https://index.html", 1))
我期望的结果应该只是:
['https://index.html']
但不知何故,“if not in”不起作用并继续将此作为输出:
['https://index.html','https://index.html']
答案 0 :(得分:2)
无论if语句是什么,都会调用crawled.append
,因为它与if语句在同一缩进级别上。你需要把它移到里面。
def crawl_web(url, depth):
toCrawl = [url]
crawled = ['https://index.html']
i = 0
while i <= depth:
interim = []
for x in toCrawl:
if x not in toCrawl and x not in crawled and x not in interim:
print("NOT IN")
crawled.append(x)
toCrawl = interim
i += 1
return crawled
print(crawl_web("https://index.html", 1))