具有数组的节点js流

时间:2019-05-17 17:06:00

标签: node.js firebase stream algolia

第一次使用节点流,我正在尝试将数组流式传输到Algolia。 Algolia提供的示例使用json文件。 https://www.algolia.com/doc/guides/sending-and-managing-data/send-and-update-your-data/how-to/sending-records-in-batches/?language=javascript#example

我试图对数组进行字符串化处理,然后像Algolia示例所描述的那样使用它。我不确定什么是最好的方法。我应该对数组进行字符串化,还是需要遍历数组并将其推入流中?后一种方法是否仍使用fs?这将在firebase函数上运行,因此存在资源限制。

const algoliasearch = require('algoliasearch')
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');

const client = algoliasearch('999999', '999999');
//const index = client.initIndex('d_DASH');
const index = client.initIndex('t_DASH');



exports.dashStream = async function (listings) {
    let jsdoc = JSON.stringify(listings);
    const stream = fs.createReadStream(jsdoc).pipe(StreamArray.withParser());


    let chunks = [];
    stream
        .on('data', ({ value }) => {
            console.log("on data...")
            chunks.push(value);
            if (chunks.length === 10000) {
                stream.pause();
                index
                    .saveObjects(chunks)
                    .then(res => {
                        chunks = [];
                        stream.resume();
                    })
                    .catch(err => console.error(err));
            }
        })
        .on('end', () => {
            console.log("on end...")
            if (chunks.length) {
                console.log(`stream over?`)
                index.saveObjects(chunks,function (err, content){
                    return content.taskID.toString();
                })
                .catch(err => console.error(err));
            }
        })
        .on('error', err => console.error(err));
}

代码需要完成对Algolia的写入,并从Algolia响应中返回taskID。

1 个答案:

答案 0 :(得分:0)

这为时已晚,但也许会对其他人有所帮助。 我建议使用其他流和管道

buffer.stream.ts

将数据收集成块并将其推入写入流的流

fig, axs = plt.subplots(figsize=(12, 4))
df['MonthYr']=pd.to_datetime(df.assign(day=1)[['year','month','day']]).dt.strftime('%m-%Y')
df.plot(kind='bar', x='MonthYr', y='count', color=yrClr, title="Num per year",ax=axs)

write.stream.ts

可以处理写入目的地的流

在对流使用异步回调时必须格外小心

import { Transform } from "stream";

// Using transform stream to collect your data into chunks
export class BufferStream extends Transform {
  private readonly buffer: object[] = [];

  constructor() {
    super({ objectMode: true });
  }

  _transform(data: object, _encoding: string, callback: () => void) {
    this.buffer.push(data);
    // You chunk size goes here. I find myself usually using a 1000
    if (this.buffer.length >= 3) {
      // Handling back pressure if push() returns false
      if (this.push(this.buffer.splice(0))) {
        callback();
      } else {
        this._read = () => {
          delete this._read;
          callback();
        };
      }
    } else {
      callback();
    }
  }

  _final(callback: () => void) {
    // Pushing leftovers
    this.push(this.buffer);
    callback();
  }
}

将它们全部粘在一起

import { Writable } from 'stream';

export class WriteStream extends Writable {
  constructor() {
    super({ objectMode: true });
  }

  async _write(chunk: object[], _encoding: string, callback: () => void) {
    // You have to handle your errors yourself in an asynchronous callback
    try {
      // await save(chunk);
      console.log('Chunk:', chunk)
      callback();
    } catch (error) {
      // nextTick to escape current stack
      process.nextTick(() => this.emit('error', error));
    }
  }
}

输出

import stream, { PassThrough } from "stream";
import { promisify } from "util";
import { BufferStream } from "./buffer.stream";
import { WriteStream } from "./write.stream";

const pipelineAsync = promisify(stream.pipeline);

(async () => {
  // Imitating a stream with some data
  const stream = new PassThrough({ objectMode: true });
  for (let i = 0; i < 10; i++) {
    stream.push(i);
  }
  stream.end();

  await pipelineAsync(stream, new BufferStream(), new WriteStream());
})();

根据我的经验,它可以自行处理背压。您不需要调用stream.pause()和.resume()。从总体上看,它看起来更干净