Question

我尝试使用scrapy抓取以下页面：http://www.t13.cl/home/d_ultimas/10。我用

class T13(CrawlSpider):
    name = 't13'
    allowed_domains = ["http://www.t13.cl"]
    start_urls = ['http://www.t13.cl/home/d_ultimas/10']

    rules = (
        Rule(LinkExtractor(allow=(r'.')),
             callback='parse_item'),
    )

    def parse_item(self, response):
        pass

但它只返回一个链接（第一个）。为什么它不遵循该页面中的所有<a>链接？（如果我使用shell，它确实返回所有选择器）

Answer 1

iframe.contentWindow.document.onmousewheel=function(event){ event.preventDefault(); };似乎正在过滤您的请求。将其更改为：

allowed_domains

抓取简单页面：scrapy不会返回所有链接

1 个答案: