Question

所以我正在尝试Node.js.我想构建一个简单的爬虫，它扫描一个页面，然后在json文件中返回所有链接。但是，当我运行脚本时，它返回0个链接。

以下是我的完整代码：

lock

终端的输出是这样的：

$ !!

node nodetest.js

{＆＃34;表＆＃34;：[]}

任何人都可以看到为什么这是空白的？将最终的json写入文件的加分点:)

Answer 1

您必须在请求的成功回调中使用obj ，这就是填充的位置：

request(url, function(err, resp, body) {
    $ = cheerio.load(body);
    links = $('a'); //jquery get all hyperlinks

    $(links).each(function(i, link) {
        var actualLink = $(link).attr('href');
        obj.table.push({id: i, url:actualLink}); //add some data
    });

    // Only here you can be sure that the "obj" variable is properly
    // populated because that's where the HTTP request completes
    var json = JSON.stringify(obj);
    console.log(json);
});

在您的代码中，您已将console.log置于请求成功之外，这是异步的，因此尚未填充obj变量。

另请注意，您不需要i变量。它将自动传递给each回调，您无需明确声明或递增它。

就将结果写入文件而言，您可以使用fs.writeFile函数：

fs.writeFile("/tmp/test", json, function(err) {
    if(!err) {
        console.log("File successfully saved");
    }
});

节点.JS Crawler到JSON输出为空

1 个答案: