Question

我有一个大小为2GB的JSON文件，我使用fs.createReadStream从它创建了一个readStream并将其通过JSONStream.parse（https://github.com/dominictarr/JSONStream）传递，然后将每条记录推送到数据库中，设置完全可以正常进行很好，但是超时会导致整个过程变慢，尤其是在处理了约20万条记录之后，整个过程的内存似乎增长缓慢。

最初JSONStream（https://github.com/dominictarr/JSONStream/issues/101）出现问题，并根据此更改https://github.com/dominictarr/JSONStream/pull/154得到解决，现在我使用版本1.3.4，因此我们可以从列表中排除JSONStream。

已验证数据库（cassandra），那里没有任何问题，因此想知道此问题是否与背压有关（https://nodejs.org/en/docs/guides/backpressuring-in-streams/），但不确定可能是什么解决方案。请分享任何想法/建议

const fs = require('fs');
const JSONStream = require('JSONStream');
const es = require('event-stream');

fs.createReadStream('../path/to/large/json/file')
  .pipe(JSONStream.parse('*'))
  .pipe(processData())

function processData() {
  return es.mapSync((data) => {
     pushDatatoDB(data)
  });
}

我尝试使用highWaterMark选项，以便我可以在给定的时间点处理一定数量的记录，然后恢复流以进一步移动，这在过程中稍有改进，但不能完全解决慢速问题。 / p>

还尝试了如下所示的.on（'data'）处理程序，但问题仍然存在，

const fs = require('fs');
const JSONStream = require('JSONStream');
const es = require('event-stream');

fs.createReadStream('../path/to/large/json/file', { highWaterMark: 2048 })
  .pipe(JSONStream.parse('*'))
  .on('data', (record) => {
      pushDatatoDB(data)
  })

来自大型JSON文件的Node.js readStream导致流程超时

0 个答案: