调用多个AWS Lambda不会并行处理

时间:2019-06-17 09:25:47

标签: node.js typescript amazon-web-services aws-lambda aws-sdk

我正在尝试从另一个lambda函数调用多个lambda函数(一个lambda函数,它将运行单独的并行进程)。第一个作为cron lambda运行,它仅从db查询文档,然后调用带有文档参数的另一个lambda。此cron lambda每五分钟运行一次,并正确查询文档。我正在用两个文档测试第二个lambda。问题在于,每次调用第二个lamda时,它只会处理一个文档-每次处理另一个文档时,它不会处理上一个调用中的文档:

例如:

  • doc 1
  • doc 2

第一次调用第二个lambda->处理文档1

第二次调用lambda->处理文档2

第二次调用lambda->处理文档1

第二个lambda的第四次调用->处理文档2

等等。

第一个(cron)lambda代码:

aws.config.update({
  region : env.lambdaRegion,
  accessKeyId: env.lambdaAccessKeyId,
  secretAccessKey: env.lambdaSecretAccessKey,
});

const lambda = new aws.Lambda({
  region: env.lambdaRegion,
});

exports.handler = async (event: any, context: any) => {
  context.callbackWaitsForEmptyEventLoop = false;

  return new Promise(async (resolve, reject) => {
    for (let i = 0; i < 100; i++) {
      const doc = await mongo.db.collection('docs').
        findOneAndUpdate(
          {
            status: 1,
            lambdaProcessing: null,
          },
          { $set: { lambdaProcessing: new Date() } },
          {
            sort: { processedAt: 1 },
            returnNewDocument: true,
          },
        );

      if (doc.value && doc.value._id) {
        const params = {
          FunctionName: env.lambdaName,
          InvocationType: 'Event',
          Payload: JSON.stringify({ docId: doc.value._id }),
        };

        lambda.invoke(params);
      } else {
        if (doc.lastErrorObject && doc.lastErrorObject.n === 0) {
          break;
        }
      }
    }
    resolve();
  });
};

第二个lambda函数:

exports.handler = async (event: any, ctx: any) => {
  ctx.callbackWaitsForEmptyEventLoop = false;

  if (event && event.docId) {
    const doc = await mongo.db.collection('docs').findById(event.docId);
    return await processDoc(doc);
  } else {
    throw new Error('doc ID is not present.');
  }
};

2 个答案:

答案 0 :(得分:0)

要在没有“难看的” cronjob解决方案的情况下并行运行多个lambda,我建议使用类型为Parallel的AWS step函数。您可以在serverless.yml中设置逻辑,该函数调用本身就是lambda函数。您可以通过callback的第二个参数传递数据。如果数据大于32kb,我还是建议您使用S3存储桶/数据库。

示例serverless.yml

stepFunctions:
  stateMachines:
    test:
      name: 'test'
      definition:
        Comment: "Testing tips-like state structure"
        StartAt: GatherData
        States:
          GatherData:
            Type: Parallel
            Branches:
              -
                StartAt: GatherDataA
                States:
                  GatherDataA:
                    Type: Task
                    Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:${self:service}-${opt:stage, self:provider.stage}-firstA"
                    TimeoutSeconds: 15
                    End: true
              -
                StartAt: GatherDataB
                States:
                  GatherDataB:
                    Type: Task
                    Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:${self:service}-${opt:stage, self:provider.stage}-firstB"
                    TimeoutSeconds: 15
                    End: true
            Next: ResolveData
          ResolveData:
            Type: Task
            Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:${self:service}-${opt:stage, self:provider.stage}-resolveAB"
            TimeoutSeconds: 15
            End: true

示例处理程序

module.exports.firstA = (event, context, callback) => {
  const data = {
    id: 3,
    somethingElse: ['Hello', 'World'],
  };
  callback(null, data);
};
module.exports.firstB = (event, context, callback) => {
  const data = {
    id: 12,
    somethingElse: ['olleH', 'dlroW'],
  };
  callback(null, data);
};

module.exports.resolveAB = (event, context, callback) => {
  console.log("resolving data from a and b: ", event);
  const [dataFromA, dataFromB] = event;
  callback(null, event);
};

更多信息,请参见

答案 1 :(得分:0)

关键是为我们要调用的每个lambda创建新的单独aws.Lambda()实例,然后我们必须解析并等待我们调用的每个lambda(promieses数组)。如果不需要等待被调用的lambda,就可以了,这样我们就不会浪费在AWS上的处理时间-这样,被调用的lambda就开始处理,然后在不等待其响应的情况下进行解析,因此主(cron)lambda可以解析。 >

固定(cron)lambda处理程序:

aws.config.update({
  region : env.lambdaRegion,
  accessKeyId: env.lambdaAccessKeyId,
  secretAccessKey: env.lambdaSecretAccessKey,
});

exports.handler = async (event: any, context: any) => {
  context.callbackWaitsForEmptyEventLoop = false;

  return new Promise(async (resolve, reject) => {
    const promises: any = [];
    for (let i = 0; i < 100; i++) {
      const doc = await global['mongo'].db.collection('docs').
        findOneAndUpdate(
          {
            status: 1,
            lambdaProcessing: null,
          },
          { $set: { lambdaProcessing: new Date() } },
          {
            sort: { processedAt: 1 },
            returnNewDocument: true,
          },
        );

      if (doc.value && doc.value._id) {
        const params = {
          FunctionName: env.lambdaName,
          InvocationType: 'Event',
          Payload: JSON.stringify({ docId: doc.value._id }),
        };

        const lambda = new aws.Lambda({
          region: env.lambdaRegion,
          maxRetries: 0,
        });

        promises.push(
          new Promise((invokeResolve, invokeReject) => {
            lambda.invoke(params, (error, data) => {
              if (error) { console.error('ERROR: ', error); }
              if (data) { console.log('SUCCESS:', data); }
              // Resolve invoke promise in any case.
              invokeResolve();
            });
          }),
        );
      } else {
        if (doc.lastErrorObject && doc.lastErrorObject.n === 0) {
          break;
        }
      }
    }
    await Promise.all(promises);
    resolve();
  });
};

第二个(处理中)lambda:

exports.handler = async (event: any, ctx: any) => {
  ctx.callbackWaitsForEmptyEventLoop = false;

  if (event && event.docId) {
    const doc = await mongo.db.collection('docs').findById(event.docId);
    processDoc(doc);
    return ctx.succeed('Completed.');
  } else {
    throw new Error('Doc ID is not present.');
  }
};

我不知道是否有使用严格的lambda函数实现此目标的更好方法,但这可行。