Question

我在正确的方向上寻求一点点了解Node工作者。我目前有节点代码从文件中读取数据，并使用网络请求执行一系列后续操作。我对数据执行的所有操作当前都在读取函数的回调中进行。

我努力解决的问题是如何最好地采用这种单一读取功能（这几乎肯定不会减慢我的应用程序速度 - 我相当肯定它是后来的请求我＆＃39 ; d喜欢分支），并将操作分成多个子进程。当然，我不想在同一行数据上多次执行我的操作，而是想让每个工作人员分一杯羹。我最好的选择是，在read-callback中，创建几个包含部分数据的数组，然后在回调之外向每个worker提供一个数组？还有其他选择吗？我的最终目标是减少我的脚本运行 x 数据量所需的时间。

var request = require('request');
var request = request.defaults({
    jar: true
})
var yacsv = require('ya-csv');
//  Post Log-In Form Information to Appropriate URL -- Occurs only once per script-run -- Log in cookies saved for subsequent requests
request.post({
    url: 'xxxxx.com',
    body: "login_info",
    //  On Reponse...
}, function(error, res, body) {
    //  Instantiate CSV Reader
    var reader = yacsv.createCsvFileReader("somefile.csv");
    //  Read Data from CSV, Row by Row -- Function happens once per CSV-row
    // THIS IS WHAT I -THINK- I CAN SPLIT AMONG MULTIPLE WORKERS
    var readData = reader.addListener('data', function(data) {
            // Bind each field from a CSV row to a corresponding variable for ease of use
            //[Variables here]
            //  Second Request for Search Form -- Uses information from a single row to query more information from a database
            request.post({
                    url: 'xxxxx.com/form',
                    body: variable_with_csv_data,
                }, function(error, res, body) {
                    //  Parse the resulting page, then page elements to variables for ease of output
                }
            });
    });
});

Answer 1

群集模块不是线程的替代。群集模块允许您通过多个processes 平衡 http请求到相同的应用程序逻辑，而无需委派责任。

您正在尝试优化的是什么？整个过程需要很长时间吗？数据事件的单独处理是否缓慢？
你的数据库调用速度慢吗？
http请求是否会变慢？

另外，我会废除ya-csv模块，它看起来有些过时了。

线程：分工

1 个答案: