我尝试了以下内容,但这些页面仍在被抓取
rules = (
Rule(SgmlLinkExtractor(deny=r'/preferences'), follow=False),
Rule(SgmlLinkExtractor(deny=r'/auth'), follow=False),
)
我做错了什么?
我也尝试过这个中间件
class URLFilterMiddleware(object):
def process_request(self, request, spider):
pr
skip_urls = ['/auth', '/preferences']
for bad_url in skip_urls:
if bad_url in request.url:
return IgnoreRequest()
else:
return request