用Cheerio抓取并请求返回503/403错误

时间:2018-09-01 00:26:44

标签: node.js web-scraping request cheerio

我正在使用Cheerio和Request来刮刮Barneys,但未成功。

代码在这里:

const options = {
    method: 'GET',
    url: 'https://www.barneys.com/designer-list/designerlist.jsp?cgid=BNY-men',
    headers: {
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
      "Accept-Encoding": "gzip, deflate, br",
      "accept-language": "en-US,en;q=0.9,ko;q=0.8",
      "cache-control": "no-cache",
      "pragma": "no-cache",
      "referer": "https://www.barneys.com/",
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
    }
  }
    request(options, function(err, resp, html) {
      if (!err && resp.statusCode == 200) {
        let $ = cheerio.load(html);
        console.log('html', html);
      } else {
        console.log('html:', html)
        console.log('err: ', err);
        console.log('response statusCode ', resp.statusCode);
        console.log('resp body', resp.body);
      }
    })

该函数的输出为:

err:  null
response statusCode  503
resp body <HTML><HEAD>
<TITLE>Service Unavailable</TITLE>
</HEAD><BODY>
<H1>Service Unavailable - DNS failure</H1>
The server is temporarily unable to service your request.  Please try again
later.<P>
Reference&#32;&#35;11&#46;76908143&#46;1535760722&#46;f55418e
</BODY></HTML>

我进行了一些研究,发现一些网站阻止了使用非普通用户代理发送的请求,这就是为什么我构建了自己的请求的原因;但是,它仍然无法正常工作。任何帮助将不胜感激

0 个答案:

没有答案