AWS Firehose Transformation lambda将所有消息放入同一s3文件夹

时间:2018-09-05 18:10:00

标签: java amazon-web-services aws-lambda amazon-kinesis amazon-kinesis-firehose

我有一个Kinesis流,我创建了firehose传递流并将所有数据保存到s3,它正确地保存在每小时的文件夹中。然后,我编写了firehose转换lambda,在将所有消息部署到同一文件夹后,我不确定自己丢失了什么。我的lambda函数响应中包含以下字段:

result.put("recordId", record.getRecordId());
result.put("result", "Ok");
result.put("approximateArrivalEpoch", record.getApproximateArrivalEpoch());
result.put("approximateArrivalTimestamp",record.getApproximateArrivalTimestamp());
result.put("kinesisRecordMetadata", record.getKinesisRecordMetadata());
result.put("data", Base64.getEncoder().encodeToString(jsonData.getBytes()));

编辑:

这是我在Java中的代码。我正在使用KinesisFirehoseEvent,我的情况不需要解码,并且在KinesisFirehoseEvent中得到了ByteBuffer

public JSONObject handler(KinesisFirehoseEvent kinesisFirehoseEvent, Context context) {
    final LambdaLogger logger = context.getLogger();
    final JSONArray resultArray = new JSONArray();
    for (final KinesisFirehoseEvent.Record record: kinesisFirehoseEvent.getRecords()) {
      final byte[] data = record.getData().array();
      final Optional<TestData> testData = deserialize(data, logger);
      if (testData.isPresent()) {
        final JSONObject jsonObj = new JSONObject();
        final String jsonData = gson.toJson(testData.get());
        jsonObj.put("recordId", record.getRecordId());
        jsonObj.put("result", "Ok");
        jsonObj.put("approximateArrivalEpoch", record.getApproximateArrivalEpoch());
        jsonObj.put("approximateArrivalTimestamp", record.getApproximateArrivalTimestamp());
        jsonObj.put("kinesisRecordMetadata", record.getKinesisRecordMetadata());
        jsonObj.put("data", Base64.getEncoder().encodeToString
                (jsonData.getBytes()));
        resultArray.add(jsonObj);
      }
      else {
        logger.log("testData not deserialized");
      }
    }
    final JSONObject jsonFinalObj = new JSONObject();
    jsonFinalObj.put("records", resultArray);
    return jsonFinalObj;
  }

2 个答案:

答案 0 :(得分:0)

lambda函数返回数据的格式不正确

查看以下示例,

'use strict';
console.log('Loading function');

/* Stock Ticker format parser */
const parser = /^\{\"TICKER_SYMBOL\"\:\"[A-Z]+\"\,\"SECTOR\"\:"[A-Z]+\"\,\"CHANGE\"\:[-.0-9]+\,\"PRICE\"\:[-.0-9]+\}/;

exports.handler = (event, context, callback) => {
    let success = 0; // Number of valid entries found
    let failure = 0; // Number of invalid entries found
    let dropped = 0; // Number of dropped entries 

    /* Process the list of records and transform them */
    const output = event.records.map((record) => {

        const entry = (new Buffer(record.data, 'base64')).toString('utf8');
        let match = parser.exec(entry);
        if (match) {
            let parsed_match = JSON.parse(match); 
            var milliseconds = new Date().getTime();
            /* Add timestamp and convert to CSV */
            const result = `${milliseconds},${parsed_match.TICKER_SYMBOL},${parsed_match.SECTOR},${parsed_match.CHANGE},${parsed_match.PRICE}`+"\n";
            const payload = (new Buffer(result, 'utf8')).toString('base64');
            if (parsed_match.SECTOR != 'RETAIL') {
                /* Dropped event, notify and leave the record intact */
                dropped++;
                return {
                    recordId: record.recordId,
                    result: 'Dropped',
                    data: record.data,
                };
            }
            else {
                /* Transformed event */
                success++;  
                return {
                    recordId: record.recordId,
                    result: 'Ok',
                    data: payload,
                };
            }
        }
        else {
            /* Failed event, notify the error and leave the record intact */
            console.log("Failed event : "+ record.data);
            failure++;
            return {
                recordId: record.recordId,
                result: 'ProcessingFailed',
                data: record.data,
            };
        }
        /* This transformation is the "identity" transformation, the data is left intact 
        return {
            recordId: record.recordId,
            result: 'Ok',
            data: record.data,
        } */
    });
    console.log(`Processing completed.  Successful records ${output.length}.`);
    callback(null, { records: output });
};

下面的文档可以帮助您获取有关数据返回格式的更多详细信息,

https://aws.amazon.com/blogs/compute/amazon-kinesis-firehose-data-transformation-with-aws-lambda/

希望有帮助。

答案 1 :(得分:0)

我仅使用上面的代码进行了此工作,它看起来像流很慢,因此尚未达到新的小时数。