Question

我正在尝试从一个特别的酒店页面上获取28天的房价。我怀疑自己被阻止了，但我不太确定。

我得到了一些结果，但不是全部。我什至尝试了各种用户代理，download_delay为30，启用了httpcahce等。

这是我的lua脚本

    function main(splash, args)
      splash.private_mode_enabled = false
      splash.js_enabled = true
      splash.images_enabled = false
      assert(splash:go(args.url))       
      function wait_for(splash, condition)
        while not condition() do
            splash:wait(20.0)
      end
      end

      wait_for(splash, function()
        return splash:evaljs("document.querySelector('ul.availability-table-revamp') != null")
      end)

      assert(splash:wait(30.0))
      splash:set_viewport_full()
      return {
        html = splash:html(),
      }
    end

我正在抓取的页面是[here] [1]。

我怎么确定这是阻止我的页面？酒店页面上没有政策-但引擎的主页上（当然）有...

我当然有更多代码要显示，但是我的猜测是，唯一可以弥补这一点的是lua。但是，如果您想了解更多，完整的代码是here：-）

当然希望你比我聪明（我想我已经知道答案了）。

Answer 1

有时网络会阻止用户ip，请尝试使用其他proxy servers，因为它可以通过我的系统访问。

Scrapy Splash结果为504

1 个答案: