Question

我尝试使用Node.js和网页抓取。在这种情况下，我试图从本地电台收集最近的歌曲进行显示。使用此特定网站，body不会返回任何内容。当我尝试使用谷歌或任何其他网站时，body有一个值。这是我试图抓的网站的一个特色吗？

这是我的代码：

var request = require('request');

var url = "http://www.radiomilwaukee.org";
request(url, function(err,resp,body) {
    if (!err && resp.statusCode == 200) {
        console.log(body);
    }
    else
    {
        console.log(err);
    }

}）;

Answer 1

这很奇怪，除非accept-encoding标头设置为gzip，否则您请求的网站似乎不会返回任何内容。考虑到这一点，使用此要点将起作用：https://gist.github.com/nickfishman/5515364

我在该要点中运行了代码，用"http://www.radiomilwaukee.org"替换了网址，并在代码完成后查看sample.html文件中的内容。

如果您希望访问代码中的网页内容，可以执行以下操作：

// ...

req.on('response', function(res) {
    var body, encoding, unzipped;

    if (res.statusCode !== 200) throw new Error('Status not 200');

    encoding = res.headers['content-encoding'];
    if (encoding == 'gzip') {
        unzipped = res.pipe(zlib.createGunzip());
        unzipped.on("readable", function() {
            // collect the content in the body variable
            body += unzipped.read().toString();
        });
    }

    // ...

节点js，请求正文为某些网站

1 个答案: