TPL Dataflow块消耗所有可用内存

时间:2015-06-23 05:31:13

标签: c# .net task-parallel-library dataflow tpl-dataflow

我有一个TransformManyBlock,其设计如下:

  • 输入:文件路径
  • 输出:IEnumerable文件的内容,一次一行

我在一个巨大的文件(61GB)上运行这个块,这个文件太大而无法放入RAM中。为了避免无限制的内存增长,我已将BoundedCapacity设置为此块的非常低的值(例如1)以及所有下游块。尽管如此,该块显然会贪婪地迭代IEnumerable,它消耗了计算机上的所有可用内存,使每个进程停止运行。在我终止进程之前,块的OutputCount不受限制地继续上升。

我该怎么做才能阻止该块以这种方式使用IEnumerable

编辑:这是一个说明问题的示例程序:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

class Program
{
    static IEnumerable<string> GetSequence(char c)
    {
        for (var i = 0; i < 1024 * 1024; ++i)
            yield return new string(c, 1024 * 1024);
    }

    static void Main(string[] args)
    {
        var options = new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 };
        var firstBlock = new TransformManyBlock<char, string>(c => GetSequence(c), options);
        var secondBlock = new ActionBlock<string>(str =>
            {
                Console.WriteLine(str.Substring(0, 10));
                Thread.Sleep(1000);
            }, options);

        firstBlock.LinkTo(secondBlock);
        firstBlock.Completion.ContinueWith(task =>
            {
                if (task.IsFaulted) ((IDataflowBlock) secondBlock).Fault(task.Exception);
                else secondBlock.Complete();
            });

        firstBlock.Post('A');
        firstBlock.Complete();
        for (; ; )
        {
            Console.WriteLine("OutputCount: {0}", firstBlock.OutputCount);
            Thread.Sleep(3000);
        }
    }
}

如果您使用的是64位信箱,请务必清除&#34;首选32位&#34; Visual Studio中的选项。我的计算机上有16GB的RAM,这个程序会立即占用每个可用的字节。

3 个答案:

答案 0 :(得分:4)

您似乎误解了TPL Dataflow的工作原理。

BoundedCapacity限制您可以发布到块中的项目数量。在您的情况下,这意味着单char进入TransformManyBlock,单string进入ActionBlock

因此,您将单个项目发布到TransformManyBlock,然后返回1024*1024字符串并尝试将其传递给ActionBlockTransformManyBlock一次只能接受一个字符串。其余的字符串将只位于private static void Main() { MainAsync().Wait(); } private static async Task MainAsync() { var block = new ActionBlock<string>(async item => { Console.WriteLine(item.Substring(0, 10)); await Task.Delay(1000); }, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 }); foreach (var item in GetSequence('A')) { await block.SendAsync(item); } block.Complete(); await block.Completion; } 的输出队列中。

您可能想要做的是创建一个块并通过等待(同步或其他方式)达到其容量时以流式方式将项目发布到其中:

$.post("../admin-login",
{
   dataName:JSON.stringify({
   username:uname,
   password:pass,
   })

}, function(data,status){
    console.log("Data:"+data);
    answer = data;
    }
);

答案 1 :(得分:0)

如果管道的输出比率小于发布比率,则消息将在管道上累积,直到内存用完或达到某个队列限制。 如果邮件的大小很大,进程将很快饿死。

如果将BoundedCapacity设置为1,则如果队列中已有一条消息,则该消息将被队列拒绝。例如,在类似批处理的情况下,这不是期望的行为。选中此post以获得真知灼见。

此工作测试说明了我的观点:

//Change BoundedCapacity to +1 to see it fail
[TestMethod]
public void stackOverflow()
{      
    var total = 1000;
    var processed = 0;
    var block = new ActionBlock<int>(
       (messageUnit) =>
       {
           Thread.Sleep(10);
           Trace.WriteLine($"{messageUnit}");
           processed++;
       },
        new ExecutionDataflowBlockOptions() { BoundedCapacity = -1 } 
   );

    for (int i = 0; i < total; i++)
    {
        var result = block.SendAsync(i);
        Assert.IsTrue(result.IsCompleted, $"failed for {i}");
    }

    block.Complete();
    block.Completion.Wait();

    Assert.AreEqual(total, processed);
}

所以我的方法是限制发布,因此管道不会在队列中累积大量消息。

下面是一种简单的方法。 这样,数据流将保持全速处理消息,但消息不会被累积,从而避免了过多的内存消耗。

//Should be adjusted for specific use.
public void postAssync(Message message)
{

    while (totalPending = block1.InputCount + ... + blockn.InputCount> 100)
    {
        Thread.Sleep(200);
        //Note: if allocating huge quantities for of memory for each message the Garbage collector may keep up with the pace. 
        //This is the perfect place to force garbage collector to release memory.

    }
    block1.SendAssync(message)
}

答案 2 :(得分:0)

似乎要创建一个输出受限的TransformManyBlock,需要三个内部块:

  1. 一个TransformBlock接收输入并产生IEnumerable,可能并行运行。
  2. 非并行的ActionBlock枚举产生的IEnumerable,并传播最终结果。
  3. 一个BufferBlock,用于存储最终结果,并遵守期望的BoundedCapacity

比较棘手的部分是如何传播第二个块的完成,因为它没有直接链接到第三个块。在下面的实现中,方法PropagateCompletion是根据库的source code编写的。

public static IPropagatorBlock<TInput, TOutput>
    CreateOutputBoundedTransformManyBlock<TInput, TOutput>(
    Func<TInput, Task<IEnumerable<TOutput>>> transform,
    ExecutionDataflowBlockOptions dataflowBlockOptions)
{
    if (transform == null) throw new ArgumentNullException(nameof(transform));
    if (dataflowBlockOptions == null)
        throw new ArgumentNullException(nameof(dataflowBlockOptions));

    var input = new TransformBlock<TInput, IEnumerable<TOutput>>(transform,
        dataflowBlockOptions);
    var output = new BufferBlock<TOutput>(dataflowBlockOptions);
    var middle = new ActionBlock<IEnumerable<TOutput>>(async results =>
    {
        if (results == null) return;
        foreach (var result in results)
        {
            var accepted = await output.SendAsync(result).ConfigureAwait(false);
            if (!accepted) break; // If one is rejected, the rest will be rejected too
        }
    }, new ExecutionDataflowBlockOptions()
    {
        MaxDegreeOfParallelism = 1,
        BoundedCapacity = dataflowBlockOptions.MaxDegreeOfParallelism,
        CancellationToken = dataflowBlockOptions.CancellationToken,
        SingleProducerConstrained = true,
    });

    input.LinkTo(middle, new DataflowLinkOptions() { PropagateCompletion = true });
    PropagateCompletion(middle, output);

    return DataflowBlock.Encapsulate(input, output);

    async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
    {
        try
        {
            await source.Completion.ConfigureAwait(false);
        }
        catch { }

        var exception = source.Completion.IsFaulted ? source.Completion.Exception : null;
        if (exception != null) target.Fault(exception); else target.Complete();
    }
}

// Overload with synchronous delegate
public static IPropagatorBlock<TInput, TOutput>
    CreateOutputBoundedTransformManyBlock<TInput, TOutput>(
    Func<TInput, IEnumerable<TOutput>> transform,
    ExecutionDataflowBlockOptions dataflowBlockOptions)
{
    return CreateOutputBoundedTransformManyBlock<TInput, TOutput>(
        item => Task.FromResult(transform(item)), dataflowBlockOptions);
}

用法示例:

var firstBlock = CreateOutputBoundedTransformManyBlock<char, string>(
    c => GetSequence(c), options);