C# Speed up parallel webrequests using async

时间:2017-07-12 08:03:39

标签: c# multithreading asynchronous parallel-processing httpwebrequest

so I have this code: This is the main function,a parallel for loop that iterates through all the data that needs to be posted and calls a function

 ParallelOptions pOpt = new ParallelOptions();
    pOpt.MaxDegreeOfParallelism = 30;
    Parallel.For(0, maxsize, pOpt, (index,loopstate) => {

                    //Calls the function where all the webrequests are made
                    CallRequests(data1,data2);

                    if (isAborted)
                        loopstate.Stop();
                });

This function is called inside the parallel loop

public static void CallRequests(string data1, string data2)     
    {
        var cookie = new CookieContainer();
        var postData =  Parameters[23] + data1 +
                        Parameters[24] + data2;

        HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
        getRequest.Accept = Parameters[26];
        getRequest.KeepAlive = true;
        getRequest.Referer = Parameters[27];
        getRequest.CookieContainer = cookie;
        getRequest.UserAgent = Parameters[28];
        getRequest.Method = WebRequestMethods.Http.Post;
        getRequest.AllowWriteStreamBuffering = true;
        getRequest.ProtocolVersion = HttpVersion.Version10;
        getRequest.AllowAutoRedirect = false;
        getRequest.ContentType = Parameters[29];
        getRequest.ReadWriteTimeout = 5000;
        getRequest.Timeout = 5000;
        getRequest.Proxy = null;

        byte[] byteArray = Encoding.ASCII.GetBytes(postData);
        getRequest.ContentLength = byteArray.Length;
        Stream newStream = getRequest.GetRequestStream(); //open connection
        newStream.Write(byteArray, 0, byteArray.Length); // Send the data.
        newStream.Close();

        HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();

        if (getResponse.Headers["Location"] == Parameters[30])
        {
            //These are simple get requests to retrieve the source code using the same format as above.
            //I need to preserve the cookie
            GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
            GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
        }
    }

From what I have seen and been told,I understand that making these requests async is a better idea than using a parallel loop.My method is also heavy on the proccesor.I wonder how can I make these requests async,but also preserve the multithreaded aspect. I also need to keep the cookie,after the post requests finishes.

1 个答案:

答案 0 :(得分:2)

CallRequests方法转换为async实际上只是使用await关键字切换异步方法的同步方法调用并更改方法签名以返回{{1 }}

这样的事情:

Task

然而,这本身并不能真正让你到任何地方,因为你仍然需要等待主方法中返回的任务。一个非常简单(如果有点生硬)的方式是简单地调用public static async Task CallRequestsAsync(string data1, string data2) { var cookie = new CookieContainer(); var postData = Parameters[23] + data1 + Parameters[24] + data2; HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]); getRequest.Accept = Parameters[26]; getRequest.KeepAlive = true; getRequest.Referer = Parameters[27]; getRequest.CookieContainer = cookie; getRequest.UserAgent = Parameters[28]; getRequest.Method = WebRequestMethods.Http.Post; getRequest.AllowWriteStreamBuffering = true; getRequest.ProtocolVersion = HttpVersion.Version10; getRequest.AllowAutoRedirect = false; getRequest.ContentType = Parameters[29]; getRequest.ReadWriteTimeout = 5000; getRequest.Timeout = 5000; getRequest.Proxy = null; byte[] byteArray = Encoding.ASCII.GetBytes(postData); getRequest.ContentLength = byteArray.Length; Stream newStream =await getRequest.GetRequestStreamAsync(); //open connection await newStream.WriteAsync(byteArray, 0, byteArray.Length); // Send the data. newStream.Close(); HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse(); if (getResponse.Headers["Location"] == Parameters[30]) { //These are simple get requests to retrieve the source code using the same format as above. //I need to preserve the cookie GetRequets(data1, data2, Parameters[31], Parameters[13], cookie); GetRequets(data1, data2, Parameters[32], Parameters[15], cookie); } } (或Task.WaitAll()如果调用方法本身要变为异步)。像这样:

await Task.WhenAll()

然而,这实在是非常直率,并且失去了对并行运行的迭代次数的控制等等。我更喜欢使用TPL dataflow library来做这种事情。该库提供了一种并行链接异步(或同步)操作的方法,并将它们从一个"处理块"到下一个。它有很多选项可用于调整并行度,缓冲区大小等。

详细的曝光超出了这个答案的可能范围,所以我鼓励你阅读它,但一种可能的方法是简单地将其推到一个动作块 - 类似这样:

var tasks = Enumerable.Range(0, maxsize).Select(index => CallRequestsAsync(data1, data2));
Task.WaitAll(tasks.ToArray());

我的答案范围之外的其他几点我应该顺便提及:

  1. 看起来就像您的var actionBlock = new ActionBlock<int>(async index => { await CallRequestsAsync(data1, data2); }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 30, BoundedCapacity = 100, }); for (int i=0; i <= maxsize; i++) { actionBlock.Post(i); // or await actionBlock.SendAsync(i) if calling method is also async } actionBlock.Complete(); actionBlock.Completion.Wait(); // or await actionBlock.Completion if calling method is also async 方法正在使用其结果更新某些外部变量一样。在可能的情况下,最好避免使用此模式,并使方法返回结果以便稍后进行整理(TPL Dataflow库通过CallRequests处理)。如果更新外部状态是不可避免的,那么请确保您已经考虑了超出我的答案范围的多线程含义(死锁,竞争条件等)。
  2. 我假设TransformBlock<>有一些有用的属性在您为问题创建最小描述时丢失了?它是否索引到参数列表或类似的东西?如果是这样,您可以随时直接迭代这些并将index更改为ActionBlock<int>
  3. 确保您了解多线程/并行执行与异步之间的区别。确实存在一些相似之处/重叠,但只是制作异步并不能使它成为多线程,也不是相反的。