我正在使用AWS Kinesis与生产者和消费者进行试验,问题是尽管我们多次更改了发送的数据对象,但消费者仍继续收到我们产生的第一条消息(或记录)。此外,我们尝试了多个ShardIteratorType,但没有一个起作用。最新不会产生任何结果,所有其他都会产生相同的原始记录。
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Amazon;
using Amazon.Internal;
using Amazon.Kinesis;
using Amazon.Kinesis.Model;
using BenchmarkRuleSetModel.Models;
using MongoDB.Driver;
using Newtonsoft.Json;
namespace ConsoleApp7
{
internal class Program
{
private static AmazonKinesisClient _client;
private static string _streamName;
static async Task ReadFromStream()
{
var kinesisStreamName = _streamName;
var describeRequest = new DescribeStreamRequest
{
StreamName = kinesisStreamName,
};
var describeResponse = await _client.DescribeStreamAsync(describeRequest);
var shards = describeResponse.StreamDescription.Shards;
foreach (var shard in shards)
{
var iteratorRequest = new GetShardIteratorRequest
{
StreamName = kinesisStreamName,
ShardId = shard.ShardId,
ShardIteratorType = ShardIteratorType.AT_TIMESTAMP,
Timestamp = DateTime.MinValue
};
var iteratorResponse = await _client.GetShardIteratorAsync(iteratorRequest);
var iteratorId = iteratorResponse.ShardIterator;
while (!string.IsNullOrEmpty(iteratorId))
{
var getRequest = new GetRecordsRequest
{
ShardIterator = iteratorId, Limit = 10000
};
var getResponse = await _client.GetRecordsAsync(getRequest);
var nextIterator = getResponse.NextShardIterator;
var records = getResponse.Records;
if (records.Count > 0)
{
Console.WriteLine("Received {0} records. ", records.Count);
foreach (var record in records)
{
var json = Encoding.UTF8.GetString(record.Data.ToArray());
Console.WriteLine("Json string: " + json);
}
}
iteratorId = nextIterator;
}
}
}
private static async Task<string> Produce()
{
var data = new
{
Message = "Hello world!",
Author = "Amir"
};
//convert to byte array in prep for adding to stream
var oByte = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(data));
using (var ms = new MemoryStream(oByte))
{
//create put request
var requestRecord = new PutRecordRequest
{
StreamName = _streamName,
PartitionKey = Guid.NewGuid().ToString(),
Data = ms
};
//list name of Kinesis stream
//give partition key that is used to place record in particular shard
//add record as memorystream
//PUT the record to Kinesis
var response = await _client.PutRecordAsync(requestRecord);
return response.SequenceNumber;
}
}
static void Main(string[] args)
{
_client = new AmazonKinesisClient("ExampleKey", "ExampleSecret", RegionEndpoint.EUWest2);
_streamName = "SomeStream";
Produce().Wait();
ReadFromStream().Wait();
}
}
}
答案 0 :(得分:2)
首先,在调试代码时,我注意到它在内部循环(while (!string.IsNullOrEmpty(iteratorId))
)中无限循环,并且从未循环过流中的所有分片(假设您有> 1)。原因在https://docs.aws.amazon.com/streams/latest/dev/troubleshooting-consumers.html#getrecords-returns-empty中进行了解释-由于生产者从未调用过MergeShards
或SplitShards
,它们保持打开状态,因此NextShardIterator
永远不会是NULL
。
这就是为什么您只看到记录放在第一个分片上的原因(或者至少是我在运行代码时所做的)-您必须并行读取分片。
就您的使用模式而言,您正在使用:
ShardIteratorType = ShardIteratorType.AT_TIMESTAMP,
Timestamp = DateTime.MinValue
通过这种方式,您实际上是在告诉Kinesis“从开始时就将流中的所有记录给我”(或至少在保留期到了为止)。这就是为什么除了新记录之外,您还会继续看到相同的旧记录的原因(同样,这就是我在运行代码时看到的内容)。
GetRecords[Async]
调用实际上并未从流中删除记录(请参见https://stackoverflow.com/a/25741304/4940707)。使用Kinesis的正确方法是在检查点之间移动。如果使用者要保留上一次读取的记录中的SequenceNumber
,然后像这样重新启动:
ShardIteratorType = ShardIteratorType.AT_SEQUENCE_NUMBER,
StartingSequenceNumber = lastSeenSequenceNumber
然后您只会看到更新的记录。