Question

我在一个小实例上运行了一个WebRole。此WebRole有一个方法可将大量文件上载到BLOB存储。根据Azure实例规范，一个小实例只有 1核。那么在上传这些blob时，Parallel.Foreach会比普通的Foreach给我任何好处吗？

Answer 1

通过专注于使用aysnc版本的blob存储API和/或Stream API，您可以获得更好的服务，这样您就可以受I / O限制而不受CPU限制。在任何有BeginXXX API的地方，您应该使用Task.Factory.FromAsync将其包装起来并使用从那里继续。在您的具体情况下，您应该使用CloudBlob.BeginUploadFromStream。最初如何获取流也同样重要，因此也要寻找异步API。

在此之后，唯一可能阻止你使用一个小实例的是它的上限为100Mbps，其中媒体为200Mbps。然后，当您需要更多处理时，您可以随时利用弹性系数并增加角色数，并在事情平静时再次缩减。

以下是使用BeginUploadFromStream致电FromAsync的示例。现在，就协调并发处理而言，由于您现在开始执行异步任务，因此您不能指望Parallel :: ForEach为您约束最大并发性。这意味着您将在原始线程上使用Semaphore进行常规foreach以限制并发性。这将提供相当于MaxDegreeOfParallelism：

的内容

// Setup a semaphore to constrain the max # of concurrent "thing"s we will process
int maxConcurrency = ... read from config ...
Semaphore maxConcurrentThingsToProcess = new Semaphore(maxConcurrency, maxConcurrency);

// Current thread will enumerate and dispatch I/O work async, this will be the only CPU resource we're holding during the async I/O
foreach(Thing thing in myThings)
{
    // Make sure we haven't reached max concurrency yet
    maxConcurrentThingsToProcess.WaitOne();

    try
    {
        Stream mySourceStream = ... get the source stream from somewhere ...;
        CloudBlob myCloudBlob = ... get the blob from somewhere ...;

        // Begin uploading the stream asynchronously
        Task uploadStreamTask = Task.Factory.FromAsync(
            myCloudBlob.BeginUploadFromStream,
            myCloudBlob.EndUploadFromStream,
            mySourceStream,
            null);

        // Setup a continuation that will fire when the upload completes (regardless of success or failure)
        uploadStreamTask.ContinueWith(uploadStreamAntecedent =>
        {
            try
            {
                // upload completed here, do any cleanup/post processing
            }
            finally
            {
                // Release the semaphore so the next thing can be processed
                maxConcurrentThingsToProcess.Release();
            }
        });
    }
    catch
    {
        // Something went wrong starting to process this "thing", release the semaphore
        maxConcurrentThingsToProcess.Release();

        throw;
    }
}

现在在这个示例中，我没有展示您应该如何异步获取源流，但是，例如，如果您从其他地方的URL下载该流，您可能希望异步启动它，将异步上传的开始链接到此处的延续。

相信我，我知道这不仅仅是做一个简单的Parallel::ForEach而是代码，但Parallel::ForEach的存在使得CPU绑定任务的并发变得容易。在I / O方面，使用异步API是实现最大I / O吞吐量同时最小化CPU资源的唯一方法。

Answer 2

核心数与Parallel.ForEach()生成的线程数没有直接关联。

大约一年前，David Aiken在Small实例上进行了非常非正式的测试，其中包含一些blob +表访问，包含Parallel.ForEach()和{{1}}。您可以看到结果here。在这种情况下，是一个测量的改进，因为这不是CPU绑定的活动。我怀疑你也会看到性能有所提升，因为你要将大量对象上传到blob存储。

Answer 3

是的，因为每个上传都将受网络限制，因此调度程序可以在其中共享您的单个核心。（毕竟，这是单核，单CPU计算机一次完成多项工作的方式。）

您也可以使用异步blob上传功能获得类似的效果。

在小天蓝色实例中使用Parallel.Foreach

3 个答案: