Scrapy-产生新请求无法循环

时间:2019-04-23 20:29:19

标签: python web-scraping scrapy

我想抓取一系列Facebook帖子。为此,我进行登录,然后加载帖子ID列表,以使请求仅记录一次。 但是,当我尝试使用yield发出请求时,它不会进入for循环。

仅出于测试目的,我更改了收益率,它确实进入了for循环并调用了parse方法。

``` lang-py
class FacebookSpider(scrapy.Spider):
  name = "test"
  start_urls = ['https://mbasic.facebook.com']

  def parse(self, response):
  return FormRequest.from_response( response, callback=self.parse_home,
    formxpath='//form[contains(@action, "login")]',
    formdata={'email': "email@email.com", 'pass': "password"}, )

  def parse_home(self, response):
    print(">> parse_home")
    if response.xpath("//div/input[@value='Ok' and @type='submit']"):
      print(">> if condition")
      return FormRequest.from_response(response, formdata={'name_action_selected': 'dont_save'}, callback=self.parse_home, dont_filter=True,)

    for post in [1,2]:
      print(">> for loop")
      href = response.urljoin("/335653391129/posts/10157014203171130".format(post))
      yield scrapy.Request(url=href, callback=self.parse_page, dont_filter=True,)

  def parse_page(self, response):
    print("____ parse_page  _________")
```

使用yield的输出是:

>> parse_home
>> if condition

仅更改收益以返回输出的是:

>> parse_home
>> if condition
>> parse_home
>> for loop
____ parse_page  _________

我不知道发生了什么。 预先谢谢你,

1 个答案:

答案 0 :(得分:1)

您的@font-face { font-family: "IconFont"; src: url(/static/media/IconFont.d9fff078.eot); src: url(/static/media/IconFont.d9fff078.eot#iefix) format("embedded-opentype"), url(/static/media/IconFont.ad47b1fb.ttf) format("truetype"), url(/static/media/IconFont.c8a8e064.woff) format("woff"), url(/static/media/IconFont.979fb19e.svg#IconFont) format("svg"); font-weight: normal; font-style: normal; } 方法是一个生成器,您不应该在生成器内部使用The resource http://localhost:3000/static/media/IconFont.ad47b1fb.ttf was preloaded using link preload but not used within a few seconds from the window's load event. Please make sure it has an appropriate `as` value and it is preloaded intentionally. 。但是我测试了您的代码,并且似乎可以正常工作。

有关Python SyntaxError: ("'return' with argument inside generator",)的更多信息