我正在努力加快一些流程的执行,这些流程将大量的记录(大多数是数百万)发布到Elasticsearch。在我的C#代码中,我已经使用Dataflow作为支架实现了一个多线程解决方案:
var fetchRecords = new TransformBlock<?, ?>(() => { ... });
var sendRecordsToElastic = new ActionBlock<List<?>>(records => sendBulkRequest(records));
fetchRecords.LinkTo(sendRecordsToElastic, { PropogateCompletion = true });
fetchRecords.Post("Start");
然后我想实现发送批量请求调用:
public IBulkResponse sendBulkRequest(List<?> records)
{
lock(SomeStaticObject)
{
// Execute several new threads to send records in bulk
}
}
我对你的问题是关于在作为Dataflow管道的一部分存在的锁中执行其他线程的实用性。
这可以吗?我能否在性能,执行,缓存/内存未命中等方面看到任何潜在的问题?
任何见解都会很乐意接受。
答案 0 :(得分:1)
您可能希望在此使用BulkAll
,它实现了可观察模式,以便向Elasticsearch发出并发批量请求。这是一个例子
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool);
var client = new ElasticClient(connectionSettings);
var indexName = "bulk-index";
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(3)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<DeviceStatus>(p => p.AutoMap())
)
);
var size = 500;
// set up the observable
var bulkAllObservable = client.BulkAll(GetDeviceStatus(), b => b
.Index(indexName)
.MaxDegreeOfParallelism(4)
.RefreshOnCompleted()
.Size(size)
);
var countdownEvent = new CountdownEvent(1);
Exception exception = null;
// set up an observer. Delegates passed are:
// 1. onNext
// 2. onError
// 3. onCompleted
var bulkAllObserver = new BulkAllObserver(
response => Console.WriteLine($"Indexed {response.Page * size} with {response.Retries} retries"),
ex =>
{
// capture exception for throwing outside Observer.
// You may decide to do something different here
exception = ex;
countdownEvent.Signal();
},
() =>
{
Console.WriteLine("Finished");
countdownEvent.Signal();
});
// subscribe to the observable
bulkAllObservable.Subscribe(bulkAllObserver);
// wait indefinitely for it to finish. May want to put a
// max timeout on this
countdownEvent.Wait();
if (exception != null)
{
throw exception;
}
}
// lazily enumerated collection
private static IEnumerable<DeviceStatus> GetDeviceStatus()
{
for (var i = 0; i < DocumentCount; i++)
yield return new DeviceStatus(i);
}
private const int DocumentCount = 20000;
public class DeviceStatus
{
public DeviceStatus(int id) => Id = id;
public int Id {get;set;}
}
如果你不需要在观察者中做任何特殊的事情,你可以在observable上使用.Wait()
方法
var bulkAllObservable = client.BulkAll(GetDeviceStatus(), b => b
.Index(indexName)
.MaxDegreeOfParallelism(4)
.RefreshOnCompleted()
.Size(size)
)
.Wait(
TimeSpan.FromHours(1),
response => Console.WriteLine($"Indexed {response.Page * size} with {response.Retries} retries")
);
BulkAll
,ScrollAll
和Reindex
有一些可观察的方法(尽管有ReindexOnServer
在Elasticsearch中重新索引并映射到the Reindex API - {{ 1}}方法早于此)