node.js request.get用html body响应,忽略404状态

时间:2015-12-29 13:13:44

标签: javascript html node.js

我遇到了node.js request模块的麻烦。我需要获取页面的html主体,所以我以这种方式向URL发出GET请求:

var request = require('request');

var headers = { 
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/40.0',
    'Content-Type' : 'application/x-www-form-urlencoded' 
};
var url = "http://shop.nag.ru/catalog/14019.Шкафы-телекоммуникационные/14020.Напольные-шкафы/14024.600x600/08061.SNR-TFC-376060-G";

request.get({url: url, headers: headers }, function (err, response, body) {
  console.log("stat" + response.statusCode);
   console.log("body" + body);

}); 

它以html身体回应。但是,如果您在浏览器中使用相同的链接,则会看到包含404 error的网页:http://shop.nag.ru/catalog/14019.Шкафы-телекоммуникационные/14020.Напольные-шкафы/14024.600x600/08061.SNR-TFC-376060-G " (网址使用俄语字母)。 那么问题是什么?为什么request没有以' 404' 状态返回回复?

1 个答案:

答案 0 :(得分:3)

尝试encode网址。大多数浏览器默认情况下会这样做,所以当你尝试从浏览器点击一个网址时:

  

http://shop.nag.ru/catalog/14019.Шкафы-телекоммуникационные/14020.Напольные-шкафы/14024.600x600/08061.SNR-TFC-376060-G

您实际上是向:

发送请求
  

http://shop.nag.ru/catalog/14019.%D0%A8%D0%BA%D0%B0%D1%84%D1%8B-%D1%82%D0%B5%D0%BB%D0%B5%D0%BA%D0%BE%D0%BC%D0%BC%D1%83%D0%BD%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D0%BE%D0%BD%D0%BD%D1%8B%D0%B5/14020.%D0%9D%D0%B0%D0%BF%D0%BE%D0%BB%D1%8C%D0%BD%D1%8B%D0%B5-%D1%88%D0%BA%D0%B0%D1%84%D1%8B/14024.600x600/08061.SNR-TFC-376060-G`

但是如果你在节点request模块的帮助下运行请求,则不会对url进行编码。所以你应该自己做:

request.get({url: encodeURI(url), headers: headers }, function (err, response, body) { 
    // ... 
});