Question

我有这段代码仅执行第一个yield的回调，而不执行下一个。我尝试过重新排序它们，结果相同：只有第一个yield回调会被执行。

    for j in range(totalOrderPages):  # the code gets in the loop
        productURI = feedUrl % (productId, j + 1)
        print "Got in the loop" # this gets printed 
        yield response.follow(productURI, self.parse_orders, meta={'pid': productId, 'categories': categories})
    yield response.follow(first_page, self.parse_product, meta={'pid': productId, 'categories': categories})

Python或scrapy中是否有任何东西可以防止连续两次产生收益？

第二个问题：我正在尝试使用pdb.set_trace（）进行调试，但是当我尝试从调试控制台执行yield时，会出现yield outside function错误。

有人知道我们如何调试产量吗？

谢谢。

Answer 1

在不了解更多详细信息的情况下，例如某些站点的重定向行为或变量的内容（feedUrl，productURI，first_page等），被Dupefilter（https://doc.scrapy.org/en/latest/topics/settings.html#dupefilter-class）丢弃。我建议您启用DEBUG日志记录级别并设置DUPEFILTER_DEBUG=True，然后检查日志以查看是否是这种情况。您可以通过在调用dont_filter=True时添加response.follow来强制请求绕过Dupefilter。

如果这不能解决您的问题，请共享您的抓取日志，以便我们提供更多信息来调试问题。刮刮乐！

连续两次屈服，只有第一个作品

1 个答案: