此代码在
之前正在运行class CustomFilter(RFPDupeFilter):
def request_seen(self, request):
fp = request.url
if fp in self.fingerprints: #if this condition is true, we have visited this url before
logging.log(logging.INFO, "Ignoring url=%r" % request.url)
return True
else: #if this condition is true, we are about to scrap a details page url
self.fingerprints.add(fp)
logging.log(logging.INFO, "Scrpaing url=%r" % request.url)
if self.file:
self.file.write(fp + os.linesep)
else:
print("why is that happending?")
return False
我在我的设置中指定了重复的类,我一直在"为什么会发生这种情况?"那是因为文件不存在。
我阅读了RFPDupeFilter的官方代码,应该已经有一个名为request.seen的文件,但由于某些原因,它没有被生成:(