获取外部网站的内容时Nodejs错误编码

时间:2017-05-22 02:59:23

标签: javascript node.js get request

我使用get模块的request方法来获取外部网站的内容。如果外部站点的编码是utf-8,那没关系,但是与其他编码有显示错误,例如shift-jis

function getExternalUrl(request, response, url){

    mod_request.get(url, function (err, res, body) {
    //mod_request.get({uri: url, encoding: 'binary'}, function (err, res, body) {
        if (err){
            console.log("\terr=" + err);
        }else{
            var result = res.body;
            // Process res.body
            response.write(result);
        }
        response.end();
    });
}

如何使用正确的编码获取外部网站的内容?

1 个答案:

答案 0 :(得分:0)

我找到了办法:

  1. 获取binary编码

    var mod_request = require('request');
    mod_request.get({uri:url,encoding:'binary',headers:headers},function(err,res,body){});

  2. 使用Buffer格式

    创建binary

    var contentBuffer = new Buffer(res.body,'binary');

  3. detect-character-encoding npm

    获取页面的实际编码

    var mod_detect_character_encoding = require('detect-character-encoding');
    var charsetMatch = mod_detect_character_encoding(contentBuffer);

  4. utf-8 npm

    将页面转换为iconv

    var mod_iconv = require('iconv')。Iconv;
    var iconv = new mod_iconv(charsetMatch.encoding,'utf-8');
    var result = iconv.convert(contentBuffer).toString();

  5. P / S:This way is only applied for text file (html, css, js). Please do not apply for image file or others which is not text