Unwrapping IObservable<task<t>&gt; into IObservable<t>

时间:2017-04-10 02:51:55

标签: c# task-parallel-library .net-4.5 system.reactive rx.net

Is there a way to unwrap the IObservable<Task<T>> into IObservable<T> keeping the same order of events, like this?

Tasks:  ----a-------b--c----------d------e---f---->
Values: -------A-----------B--C------D-----E---F-->

Let's say I have a desktop application that consumes a stream of messages, some of which require heavy post-processing:

IObservable<Message> streamOfMessages = ...;

IObservable<Task<Result>> streamOfTasks = streamOfMessages
    .Select(async msg => await PostprocessAsync(msg));

IObservable<Result> streamOfResults = ???; // unwrap streamOfTasks

I imagine two ways of dealing with that.

First, I can subscribe to streamOfTasks using the asynchronous event handler:

streamOfTasks.Subscribe(async task =>
{
    var result = await task;
    Display(result);
});

Second, I can convert streamOfTasks using Observable.Create, like this:

var streamOfResults =
    from task in streamOfTasks
    from value in Observable.Create<T>(async (obs, cancel) =>
    {
        var v = await task;
        obs.OnNext(v);

        // TODO: don't know when to call obs.OnComplete()
    })
    select value;

streamOfResults.Subscribe(result => Display(result));

Either way, the order of messages is not preserved: some later messages that don't need any post-processing come out faster than earlier messages that require post-processing. Both my solutions handle the incoming messages in parallel, but I'd like them to be processed sequentially, one by one.

I can write a simple task queue to process just one task at a time, but perhaps it's an overkill. Seems to me that I'm missing something obvious.


UPD. I wrote a sample console program to demonstrate my approaches. All solutions by far don't preserve the original order of events. Here is the output of the program:

Timer: 0
Timer: 1
Async handler: 1
Observable.Create: 1
Observable.FromAsync: 1
Timer: 2
Async handler: 2
Observable.Create: 2
Observable.FromAsync: 2
Observable.Create: 0
Async handler: 0
Observable.FromAsync: 0

Here is the complete source code:

// "C:\Program Files (x86)\MSBuild\14.0\Bin\csc.exe" test.cs /r:System.Reactive.Core.dll /r:System.Reactive.Linq.dll /r:System.Reactive.Interfaces.dll

using System;
using System.Reactive;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        Console.WriteLine("Press ENTER to exit.");

        // the source stream
        var timerEvents = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(1));
        timerEvents.Subscribe(x => Console.WriteLine($"Timer: {x}"));

        // solution #1: using async event handler
        timerEvents.Subscribe(async x =>
        {
            var result = await PostprocessAsync(x);
            Console.WriteLine($"Async handler: {x}");
        });

        // solution #2: using Observable.Create
        var processedEventsV2 =
            from task in timerEvents.Select(async x => await PostprocessAsync(x))
            from value in Observable.Create<long>(async (obs, cancel) =>
            {
                var v = await task;
                obs.OnNext(v);
            })
            select value;
        processedEventsV2.Subscribe(x => Console.WriteLine($"Observable.Create: {x}"));

        // solution #3: using FromAsync, as answered by @Enigmativity
        var processedEventsV3 =
            from msg in timerEvents
            from result in Observable.FromAsync(() => PostprocessAsync(msg))
            select result;

        processedEventsV3.Subscribe(x => Console.WriteLine($"Observable.FromAsync: {x}"));

        Console.ReadLine();
    }

    static async Task<long> PostprocessAsync(long x)
    {
        // some messages require long post-processing
        if (x % 3 == 0)
        {
            await Task.Delay(TimeSpan.FromSeconds(2.5));
        }

        // and some don't
        return x;
    }
}

5 个答案:

答案 0 :(得分:2)

将@ Enigmativity simple approach与@ VMAtm的attaching the counter想法和this SO question的一些代码段结合起来,我提出了这个解决方案:

// usage
var processedStream = timerEvents.SelectAsync(async t => await PostprocessAsync(t));

processedStream.Subscribe(x => Console.WriteLine($"Processed: {x}"));

// my sample console program prints the events ordered properly:
Timer: 0
Timer: 1
Timer: 2
Processed: 0
Processed: 1
Processed: 2
Timer: 3
Timer: 4
Timer: 5
Processed: 3
Processed: 4
Processed: 5
....

以下是我的SelectAsync扩展方法,可将IObservable<Task<TSource>>转换为IObservable<TResult>,保留原始事件顺序:

public static IObservable<TResult> SelectAsync<TSource, TResult>(
    this IObservable<TSource> src,
    Func<TSource, Task<TResult>> selectorAsync)
{
    // using local variable for counter is easier than src.Scan(...)
    var counter = 0;
    var streamOfTasks =
        from source in src
        from result in Observable.FromAsync(async () => new
        {
            Index = Interlocked.Increment(ref counter) - 1,
            Result = await selectorAsync(source)
        })
        select result;

    // buffer the results coming out of order
    return Observable.Create<TResult>(observer =>
    {
        var index = 0;
        var buffer = new Dictionary<int, TResult>();

        return streamOfTasks.Subscribe(item =>
        {
            buffer.Add(item.Index, item.Result);

            TResult result;
            while (buffer.TryGetValue(index, out result))
            {
                buffer.Remove(index);
                observer.OnNext(result);
                index++;
            }
        });
    });
}

我对我的解决方案并不是特别满意,因为它对我来说太复杂了,但至少它不需要任何外部依赖。我在这里使用一个简单的字典来缓冲和重新排序任务结果,因为订阅者need not to be thread-safe(订阅被同时调用)。

欢迎任何意见或建议。我仍然希望在没有自定义缓冲扩展方法的情况下找到本机RX方式。

答案 1 :(得分:1)

Is the following simple approach an answer for you?

IObservable<Result> streamOfResults =
    from msg in streamOfMessages
    from result in Observable.FromAsync(() => PostprocessAsync(msg))
    select result;

答案 2 :(得分:1)

为了维护活动的顺序,您可以将您的信息流从TPL Dataflow汇集到TransformBlockTransformBlock将执行您的后处理逻辑,默认情况下将保持其输出的顺序。

using System;
using System.Collections.Generic;
using System.Reactive.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using NUnit.Framework;

namespace HandlingStreamInOrder {

    [TestFixture]
    public class ItemHandlerTests {

        [Test]
        public async Task Items_Are_Output_In_The_Same_Order_As_They_Are_Input() {
            var itemHandler = new ItemHandler();
            var timerEvents = Observable.Timer(TimeSpan.Zero, TimeSpan.FromMilliseconds(250));
            timerEvents.Subscribe(async x => {
                var data = (int)x;
                Console.WriteLine($"Value Produced: {x}");                
                var dataAccepted = await itemHandler.SendAsync((int)data);
                if (dataAccepted) {
                    InputItems.Add(data);
                }                
            });

            await Task.Delay(5000);
            itemHandler.Complete();
            await itemHandler.Completion;

            CollectionAssert.AreEqual(InputItems, itemHandler.OutputValues);
        }

        private IList<int> InputItems {
            get;
        } = new List<int>();
    }

    public class ItemHandler {


        public ItemHandler() {            
            var options = new ExecutionDataflowBlockOptions() {
                BoundedCapacity = DataflowBlockOptions.Unbounded,
                MaxDegreeOfParallelism = Environment.ProcessorCount,
                EnsureOrdered = true
            };
            PostProcessBlock = new TransformBlock<int, int>((Func<int, Task<int>>)PostProcess, options);

            var output = PostProcessBlock.AsObservable().Subscribe(x => {
                Console.WriteLine($"Value Output: {x}");
                OutputValues.Add(x);
            });
        }

        public async Task<bool> SendAsync(int data) {
            return await PostProcessBlock.SendAsync(data);
        }

        public void Complete() {
            PostProcessBlock.Complete();
        }

        public Task Completion {
            get { return PostProcessBlock.Completion; }
        }

        public IList<int> OutputValues {
            get;
        } = new List<int>();

        private IPropagatorBlock<int, int> PostProcessBlock {
            get;
        }

        private async Task<int> PostProcess(int data) {
            if (data % 3 == 0) {
                await Task.Delay(TimeSpan.FromSeconds(2));
            }            
            return data;
        }
    }
}

答案 3 :(得分:1)

Rx and TPL can be easily combined此处,TPL默认保存事件的顺序,因此您的代码可能是这样的:

node number

修改using System.Threading.Tasks; using System.Threading.Tasks.Dataflow; static async Task<long> PostprocessAsync(long x) { ... } IObservable<Message> streamOfMessages = ...; var streamOfTasks = new TransformBlock<long, long>(async msg => await PostprocessAsync(msg) // set the concurrency level for messages to handle , new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }); // easily convert block into observable IObservable<long> streamOfResults = streamOfTasks.AsObservable(); 扩展名意味着成为UI事件的反应式管道。由于这种类型的应用程序通常是单线程的,因此正在处理消息并保存订单。但总的来说events in C# aren't thread safe,所以你必须为订单提供一些额外的逻辑。

如果您不想引入另一个依赖项,您需要将操作号存储为Interlocked类,如下所示:

Rx

答案 4 :(得分:1)

RX库包含三个ConcatMergeSwitch可以解开可观察任务序列的运算符。所有这三个参数都接受类型为source的单个IObservable<Task<T>>参数,并返回IObservable<T>。以下是他们对文档的描述:

Concat

连接所有任务结果,只要前一个任务成功终止即可。

Merge

将所有源任务的结果合并为一个可观察的序列。

Switch

将可观察的任务序列转换为可观察的序列,仅从最新的可观察序列产生值。每次收到新任务时,前一个任务的结果都会被忽略。

换句话说,Concat按原始顺序返回结果,Merge按完成顺序返回结果,Switch过滤掉未执行任务的结果在下一个任务发出之前完成。因此,只需使用内置的Concat运算符就可以解决您的问题。无需自定义运算符。

var streamOfResults = streamOfTasks
    .Select(async task =>
    {
        var result1 = await task;
        var result2 = await PostprocessAsync(result1);
        return result2;
    })
    .Concat();

streamOfTasks发出任务之前,任务已经开始。换句话说,它们以"hot"状态出现。因此,Concat运算符一个接一个地等待它们的事实与操作的并发性无关。它仅影响结果的顺序。如果不是像冷的可观测对象(例如,由Observable.FromAsyncObservable.Create方法创建的冷可观测对象,而是Concat将按顺序执行操作),则应该考虑这一点。 >