Question

我正在通过Parallel.ForEach处理各种大小的PDF（简单的2MB到几百MB的高DPI扫描），并偶尔会遇到OutOfMemoryException - 可以理解的是由于进程为32位并且线程产生于Parallel.ForEach占用了大量未知的内存消耗工作。

限制MaxDegreeOfParallelism确实有效，尽管有大量（10k +）批量的小型PDF需要处理的时间的吞吐量还不够，因为可能有更多的线程工作，因为内存占用很少说线程。这是一个CPU繁重的过程，Parallel.ForEach很容易达到100％的CPU，然后点击偶尔的一组大型PDF并获得OutOfMemoryException。运行Performance Profiler会将其备份。

根据我的理解，为Parallel.ForEach设置分区器不会提高我的性能。

这导致我使用自定义TaskScheduler传递给我的Parallel.ForEach并进行MemoryFailPoint检查。在它周围搜索似乎有关于创建自定义TaskScheduler对象的稀缺信息。

在Stackoverflow上查看Specialized Task Schedulers in .NET 4 Parallel Extensions Extras，A custom TaskScheduler in C#和各种答案，我已经创建了自己的TaskScheduler并使用了QueueTask方法：

protected override void QueueTask(Task task)
{
    lock (tasks) tasks.AddLast(task);
    try
    {
        using (MemoryFailPoint memFailPoint = new MemoryFailPoint(600))
        {
            if (runningOrQueuedCount < maxDegreeOfParallelism)
            {
                runningOrQueuedCount++;
                RunTasks();
            }
        }
    }
    catch (InsufficientMemoryException e)
    {     
        // somehow return thread to pool?           
        Console.WriteLine("InsufficientMemoryException");
    }
}

虽然try / catch有点贵，但我的目标是捕获600MB的可能最大大小PDF（+一点额外内存开销）将抛出OutOfMemoryException。当我捕获InsufficientMemoryException时，这个解决方案似乎杀掉了试图完成工作的线程。有了足够大的PDF，我的代码最终成为一个单一的线程Parallel.ForEach。

在Parallel.ForEach和OutOfMemoryExceptions上的Stackoverflow上发现的其他问题似乎不适合我在线程上使用动态内存的最大吞吐量的用例，并且通常只使用MaxDegreeOfParallelism作为静态解决方案，例如：

因此，要获得可变工作内存大小的最大吞吐量：

当线程被拒绝通过MemoryFailPoint检查工作时，如何将线程返回到线程池？
当有空闲内存时，我如何/在哪里安全地生成新线程以重新开始工作？

编辑：由于光栅化和栅格化图像处理组件（取决于PDF内容），磁盘上的PDF大小可能不会线性表示内存中的大小。

Answer 1

使用Samples for Parallel Programming with the .NET Framework中的LimitedConcurrencyLevelTaskScheduler我能够进行微调，以获得看起来像我想要的东西。以下是修改后NotifyThreadPoolOfPendingWork类的LimitedConcurrencyLevelTaskScheduler方法：

private void NotifyThreadPoolOfPendingWork()
{
    ThreadPool.UnsafeQueueUserWorkItem(_ =>
    {
        // Note that the current thread is now processing work items.
        // This is necessary to enable inlining of tasks into this thread.
        _currentThreadIsProcessingItems = true;
        try
        {
            // Process all available items in the queue.
            while (true)
            {
                Task item;
                lock (_tasks)
                {
                    // When there are no more items to be processed,
                    // note that we're done processing, and get out.
                    if (_tasks.Count == 0)
                    {
                        --_delegatesQueuedOrRunning;
                        break;
                    }

                    // Get the next item from the queue
                    item = _tasks.First.Value;
                    _tasks.RemoveFirst();
                }

                // Execute the task we pulled out of the queue
                //base.TryExecuteTask(item);

                try
                {
                    using (MemoryFailPoint memFailPoint = new MemoryFailPoint(650))
                    {
                        base.TryExecuteTask(item);
                    }
                }
                catch (InsufficientMemoryException e)
                {
                    Thread.Sleep(500);

                    lock (_tasks)
                    {
                        _tasks.AddLast(item);
                    }
                }

            }
        }
        // We're done processing items on the current thread
        finally { _currentThreadIsProcessingItems = false; }
    }, null);
}

我们会看看这个问题，但反之亦然。我们将要处理的任务添加回任务列表（_tasks），该任务触发事件以获取可用线程以获取该工作。但是我们首先睡觉当前线程，以便它不直接接收工作并返回到失败的MemoryFailPoint检查。

Parallel.ForEach使用自定义TaskScheduler来防止OutOfMemoryException

1 个答案: