Question

我刚开始使用node-postgres与postgres一起玩node.js。我试图做的一件事就是写一个简短的js来填充我的数据库，使用一个包含大约200,000个条目的文件。

我注意到在一段时间后（少于10秒），我开始得到“错误：连接已终止”。我不确定这是否是我使用node-postgres的问题，或者是因为我是垃圾邮件postgres。

无论如何，这是一个显示此行为的简单代码：

var pg = require('pg');
var connectionString = "postgres://xxxx:xxxx@localhost/xxxx";

pg.connect(connectionString, function(err,client,done){
  if(err) {
    return console.error('could not connect to postgres', err);
  }

  client.query("DROP TABLE IF EXISTS testDB");
  client.query("CREATE TABLE IF NOT EXISTS testDB (id int, first int, second int)");
  done();

  for (i = 0; i < 1000000; i++){
    client.query("INSERT INTO testDB VALUES (" + i.toString() + "," + (1000000-i).toString() + "," + (-i).toString() + ")",   function(err,result){
      if (err) {
         return console.error('Error inserting query', err);
      }
      done();
    });
  }
});

在大约18,000-20,000次查询后失败。这是使用client.query的错误方法吗？我尝试更改默认客户端号码，但似乎没有帮助。

client.connect（）似乎也没有帮助，但那是因为我有太多客户端，所以我绝对认为客户端池是可行的。

感谢您的帮助！

Answer 1

<强>更新

此答案已被本文取代：Data Imports，代表最新方法。

为了复制你的场景，我使用了pg-promise库，我可以确认无论你使用哪个库，正面试用都无法正常工作，这是重要的方法。

下面是一个修改过的方法，我们将插入分区为块，然后在事务中执行每个块，这是负载平衡（也就是限制）：

function insertRecords(N) {
    return db.tx(function (ctx) {
        var queries = [];
        for (var i = 1; i <= N; i++) {
            queries.push(ctx.none('insert into test(name) values($1)', 'name-' + i));
        }
        return promise.all(queries);
    });
}
function insertAll(idx) {
    if (!idx) {
        idx = 0;
    }
    return insertRecords(100000)
        .then(function () {
            if (idx >= 9) {
                return promise.resolve('SUCCESS');
            } else {
                return insertAll(++idx);
            }
        }, function (reason) {
            return promise.reject(reason);
        });
}
insertAll()
    .then(function (data) {
        console.log(data);
    }, function (reason) {
        console.log(reason);
    })
    .done(function () {
        pgp.end();
    });

这在大约4分钟内产生了1000,000条记录，在前3次交易后大幅放缓。我使用的是Node JS 0.10.38（64位），它消耗了大约340MB的内存。这样我们就可以连续10次插入100,000条记录。

如果我们这样做，只有这次在100个事务中插入10,000条记录，相同的1,000,000条记录仅在1m25s内添加，没有减速，Node JS消耗大约100MB的内存，这告诉我们像这样分区数据是一个非常好的主意。

使用哪个库并不重要，方法应该相同：

将插入分区/限制为多个事务;
将单个事务中的插入列表保留在大约10,000条记录中;
在同步链中执行所有交易。
在每次交易后发送连接回到池中。

如果您违反任何规则，您就会遇到麻烦。例如，如果您违反规则3，您的Node JS进程可能会快速耗尽内存并抛出错误。我的例子中的规则4由图书馆提供。

如果您遵循此模式，则无需使用连接池设置来解决问题。

更新1

pg-promise的更高版本完美支持此类场景，如下所示：

function factory(index) {
    if (index < 1000000) {
        return this.query('insert into test(name) values($1)', 'name-' + index);
    }
}

db.tx(function () {
    return this.batch([
        this.none('drop table if exists test'),
        this.none('create table test(id serial, name text)'),
        this.sequence(factory), // key method
        this.one('select count(*) from test')
    ]);
})
    .then(function (data) {
        console.log("COUNT:", data[3].count);
    })
    .catch(function (error) {
        console.log("ERROR:", error);
    });

如果您不想包含任何额外内容，例如表格创建，那么它看起来更简单：

function factory(index) {
    if (index < 1000000) {
        return this.query('insert into test(name) values($1)', 'name-' + index);
    }
}

db.tx(function () {
    return this.sequence(factory);
})
    .then(function (data) {
        // success;
    })
    .catch(function (error) {
        // error;
    });

有关详细信息，请参阅Synchronous Transactions。

例如，使用Bluebird作为承诺库，我的生产机器上需要1m43s来插入1,000,000条记录（没有启用长堆栈跟踪）。

根据factory，您只需要index方法返回请求，直到您没有留下，简单为止。

最好的部分，这不是很快，但也会对NodeJS进程产生很小的负担。在整个测试期间，内存测试过程保持在60MB以下，仅消耗7-8％的CPU时间。

更新2

从版本1.7.2开始，pg-promise轻松支持超大规模交易。请参阅章节Synchronous Transactions。

例如，我可以在家用电脑上在15分钟内在一次交易中插入10,000,000条记录，Windows 8.1为64位。

对于测试，我将PC设置为生产模式，并使用Bluebird作为promise库。在测试期间，整个NodeJS 0.12.5进程（64位）的内存消耗不超过75MB，而我的i7-4770 CPU显示一致的15％负载。

以相同的方式插入100米记录需要更多的耐心，但不需要更多的计算机资源。

与此同时，先前的1m插入测试从1m43s下降到1m31s。

更新3

以下注意事项可能会产生巨大影响：Performance Boost。

更新4

相关问题，有一个更好的实现示例： Massive inserts with pg-promise

更新5

可在此处找到更好，更新的示例：nodeJS inserting Data into PostgreSQL error

Answer 2

我猜你达到了最大游泳池大小。由于client.query是异步的，因此在返回之前会使用所有可用的连接。

默认池大小为10.点击此处：https://github.com/brianc/node-postgres/blob/master/lib/defaults.js#L27

您可以通过设置pg.defaults.poolSize：

来增加默认池大小

pg.defaults.poolSize = 20;

更新：释放连接后执行另一个查询。

var pg = require('pg');
var connectionString = "postgres://xxxx:xxxx@localhost/xxxx";
var MAX_POOL_SIZE = 25;

pg.defaults.poolSize = MAX_POOL_SIZE;
pg.connect(connectionString, function(err,client,done){
  if(err) {
    return console.error('could not connect to postgres', err);
  }

  var release = function() {
    done();
    i++;
    if(i < 1000000)
      insertQ();
  };

  var insertQ = function() {
    client.query("INSERT INTO testDB VALUES (" + i.toString() + "," + (1000000-i).toString() + "," + (-i).toString() + ")",        function(err,result){
      if (err) {
         return console.error('Error inserting query', err);
      }
      release();
    });
  };

  client.query("DROP TABLE IF EXISTS testDB");
  client.query("CREATE TABLE IF NOT EXISTS testDB (id int, first int,    second int)");
  done();

  for (i = 0; i < MAX_POOL_SIZE; i++){
    insertQ();
  }
});

基本思想是，由于您将连接池大小相对较小的大量查询排入队列，因此达到最大池大小。这里我们只在释放现有连接后才进行新查询。

具有大量查询的node-postgres

2 个答案: