应用错误收集

我使用以下规则

   Rule(SgmlLinkExtractor(deny=path_deny_base, deny_domains=deny_domains),
        callback='save_page', follow=True)

其中path_deny_base是：

path_deny_base = [
    #'\?(.{80,200})',
    '/whois/',
    '/edit',
    '/login/',
    '/calendar/',
    '.*\?.*',
    '\?',
    '/search/',
    '/suche/',

]

在这次运行中，我想跳过带有查询的路径（？...）和其他几个，我看到的页面有网址，如

http://example.com/login/?_cookie_set=yes....

已下载。

任何提示？是的，我可以尝试shell，现在就做...

scrapy CrawlSpider拒绝SgmlLinkExtractor中的路径似乎无法正常工作

0 个答案: