AWS Kinesis .NET使用者

时间:2019-05-03 20:15:57

标签: c# amazon-web-services .net-core amazon-kinesis

我正在使用AWS Kinesis与生产者和消费者进行试验,问题是尽管我们多次更改了发送的数据对象,但消费者仍继续收到我们产生的第一条消息(或记录)。此外,我们尝试了多个ShardIteratorType,但没有一个起作用。最新不会产生任何结果,所有其他都会产生相同的原始记录。

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.Internal;
using Amazon.Kinesis;
using Amazon.Kinesis.Model;
using BenchmarkRuleSetModel.Models;
using MongoDB.Driver;
using Newtonsoft.Json;

namespace ConsoleApp7
{
    internal class Program
    {
        private static AmazonKinesisClient _client;
        private static string _streamName;

        static async Task ReadFromStream()
        {
            var kinesisStreamName = _streamName;

            var describeRequest = new DescribeStreamRequest
            {
                StreamName = kinesisStreamName,
            };

            var describeResponse = await _client.DescribeStreamAsync(describeRequest);
            var shards = describeResponse.StreamDescription.Shards;

            foreach (var shard in shards)
            {
                var iteratorRequest = new GetShardIteratorRequest
                {
                    StreamName = kinesisStreamName,
                    ShardId = shard.ShardId,
                    ShardIteratorType = ShardIteratorType.AT_TIMESTAMP,
                    Timestamp = DateTime.MinValue
                };

                var iteratorResponse = await _client.GetShardIteratorAsync(iteratorRequest);
                var iteratorId = iteratorResponse.ShardIterator;

                while (!string.IsNullOrEmpty(iteratorId))
                {
                    var getRequest = new GetRecordsRequest
                    {
                        ShardIterator = iteratorId, Limit = 10000
                    };

                    var getResponse = await _client.GetRecordsAsync(getRequest);
                    var nextIterator = getResponse.NextShardIterator;
                    var records = getResponse.Records;

                    if (records.Count > 0)
                    {
                        Console.WriteLine("Received {0} records. ", records.Count);
                        foreach (var record in records)
                        {
                            var json = Encoding.UTF8.GetString(record.Data.ToArray());
                            Console.WriteLine("Json string: " + json);
                        }
                    }

                    iteratorId = nextIterator;
                }
            }
        }

        private static async Task<string> Produce()
        {
            var data = new
            {
                Message = "Hello world!",
                Author = "Amir"
            };

            //convert to byte array in prep for adding to stream
            var oByte = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(data));

            using (var ms = new MemoryStream(oByte))
            {
                //create put request
                var requestRecord = new PutRecordRequest
                {
                    StreamName = _streamName,
                    PartitionKey = Guid.NewGuid().ToString(),
                    Data = ms
                };
                //list name of Kinesis stream
                //give partition key that is used to place record in particular shard
                //add record as memorystream

                //PUT the record to Kinesis
                var response = await _client.PutRecordAsync(requestRecord);

                return response.SequenceNumber;
            }
        }

        static void Main(string[] args)
        {
            _client = new AmazonKinesisClient("ExampleKey", "ExampleSecret", RegionEndpoint.EUWest2);

            _streamName = "SomeStream";

            Produce().Wait();

            ReadFromStream().Wait();
        }
    }
}

1 个答案:

答案 0 :(得分:2)

首先,在调试代码时,我注意到它在内部循环(while (!string.IsNullOrEmpty(iteratorId)))中无限循环,并且从未循环过流中的所有分片(假设您有> 1)。原因在https://docs.aws.amazon.com/streams/latest/dev/troubleshooting-consumers.html#getrecords-returns-empty中进行了解释-由于生产者从未调用过MergeShardsSplitShards,它们保持打开状态,因此NextShardIterator永远不会是NULL

这就是为什么您只看到记录放在第一个分片上的原因(或者至少是我在运行代码时所做的)-您必须并行读取分片。

就您的使用模式而言,您正在使用:

ShardIteratorType = ShardIteratorType.AT_TIMESTAMP,
Timestamp = DateTime.MinValue

通过这种方式,您实际上是在告诉Kinesis“从开始时就将流中的所有记录给我”(或至少在保留期到了为止)。这就是为什么除了新记录之外,您还会继续看到相同的旧记录的原因(同样,这就是我在运行代码时看到的内容)。

GetRecords[Async]调用实际上并未从流中删除记录(请参见https://stackoverflow.com/a/25741304/4940707)。使用Kinesis的正确方法是在检查点之间移动。如果使用者要保留上一次读取的记录中的SequenceNumber,然后像这样重新启动:

ShardIteratorType = ShardIteratorType.AT_SEQUENCE_NUMBER,
StartingSequenceNumber = lastSeenSequenceNumber

然后您只会看到更新的记录。