我一直在玩无聊,同时从wiki中检索随机文章。首先我写了这段代码:
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=fr_BE.UTF-8 LC_NUMERIC=C LC_TIME=fr_BE.UTF-8
[4] LC_COLLATE=fr_BE.UTF-8 LC_MONETARY=fr_BE.UTF-8 LC_MESSAGES=fr_BE.UTF-8
[7] LC_PAPER=fr_BE.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=fr_BE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] MASS_7.3-49 compiler_3.4.3 Matrix_1.2-11 parallel_3.4.3 tools_3.4.3
[6] mgcv_1.8-23 yaml_2.1.18 nlme_3.1-131.1 grid_3.4.3 permute_0.9-4
[11] vegan_2.4-6 cluster_2.0.6 lattice_0.20-35
但我想摆脱这种异步匿名方法:private async void Window_Loaded(object sender, RoutedEventArgs e)
{
await DownloadAsync();
}
private async Task DownloadAsync()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var tasks = new List<Task>();
var result = new List<string>();
for (int index = 0; index < 60; index++)
{
var task = Task.Run(async () => {
var scheduledAt = DateTime.UtcNow.ToString("mm:ss.fff");
using (var client = new HttpClient())
using (var response = await client.GetAsync("https://en.wikipedia.org/wiki/Special:Random"))
using (var content = response.Content)
{
var page = await content.ReadAsStringAsync();
var receivedAt = DateTime.UtcNow.ToString("mm:ss.fff");
var data = $"Job done at thread: {Thread.CurrentThread.ManagedThreadId}, Scheduled at: {scheduledAt}, Recieved at: {receivedAt} {page}";
result.Add(data);
}
});
tasks.Add(task);
}
await Task.WhenAll(tasks.ToArray());
sw.Stop();
Console.WriteLine($"Process took: {sw.Elapsed.Seconds} sec {sw.Elapsed.Milliseconds} ms");
foreach (var item in result)
{
Debug.WriteLine(item);
}
}
,所以我将相关的代码部分替换为:
Task.Run(async () => ...
我希望它执行完全相同,因为我用synchronous替换的异步代码被包装在一个任务中,所以我保证任务调度程序(WPF任务调度程序)会将它排在某个空闲线程上。线程池。这正是我看到返回结果时发生的事情,我得到的值如下:
for (int index = 0; index < 60; index++)
{
var task = Task.Run(() => {
var scheduledAt = DateTime.UtcNow.ToString("mm:ss.fff");
using (var client = new HttpClient())
// Get this synchronously.
using (var response = client.GetAsync("https://en.wikipedia.org/wiki/Special:Random").Result)
using (var content = response.Content)
{
// Get this synchronously.
var page = content.ReadAsStringAsync().Result;
var receivedAt = DateTime.UtcNow.ToString("mm:ss.fff");
var data = $"Job done at thread: {Thread.CurrentThread.ManagedThreadId}, Scheduled at: {scheduledAt}, Recieved at: {receivedAt} {page}";
result.Add(data);
}
});
tasks.Add(task);
}
问题是第一个代码在~6秒内执行,第二个代码(同步Job done at thread: 6, Scheduled at: 53:57.534, Recieved at: 54:54.545 ...
Job done at thread: 21, Scheduled at: 54:06.742, Recieved at: 54:54.574 ...
Job done at thread: 41, Scheduled at: 54:26.742, Recieved at: 54:54.576 ...
Job done at thread: 10, Scheduled at: 53:59.018, Recieved at: 54:54.614 ...
)需要 ~50秒。随着减少任务数量,差异变小。任何人都可以解释为什么他们花了这么长时间,即使他们在不同的线程上执行并执行完全相同的单一操作?
答案 0 :(得分:6)
因为线程池在请求新线程时可能会引入延迟,所以如果池中的线程总数大于可配置的最小值。默认情况下,该最小值为number of cores
。在使用.Result
的示例中,您将对60个任务进行排队,这些任务在执行的整个持续时间内都拥有线程池线程。这意味着只有number of cores
任务将立即启动,然后休息将以延迟开始(如果已经忙碌的线程可用,线程池将等待一定时间,如果没有 - 将添加新线程)。
更糟糕的是 - client.GetAsync
(在从服务器收到回复后在GetAsync
函数内执行的代码)的延续也被安排到线程池线程。这包含了所有60个任务,因为它们在从GetAsync
接收结果之前无法完成,并且GetAsync
需要空闲线程池线程来运行其继续。结果,还有一个额外的争用:你创建了60个任务,还有60个来自GetAsync
的延续,它们也希望线程池线程能够运行(而你的60个任务被阻塞,等待那些延续的结果)。
在await
的示例中 - 线程池线程在异步http调用期间被释放。因此,当您调用await GetAsync()
并且GetAsync
到达异步IO点(实际上发出http请求)时 - 您的线程将被释放回池中。现在它可以自由处理其他请求。这意味着await
示例可以在更短的时间内保存线程池线程,并且在等待线程池线程可用时(几乎)没有延迟。
您可以通过执行此操作轻松确认(请勿使用真实代码,仅用于测试)
ThreadPool.SetMinThreads(100, 100);
增加上面提到的池中可配置的最小线程数。当你将它增加到大值时 - 例如.Result
的所有60个任务将在60个线程池线程上同时启动,没有延迟,因此你的示例将在大致相同的时间内完成。
以下是观察其工作原理的示例应用程序:
public class Program {
public static void Main(string[] args) {
DownloadAsync().Wait();
Console.ReadKey();
}
private static async Task DownloadAsync() {
Stopwatch sw = new Stopwatch();
sw.Start();
var tasks = new List<Task>();
for (int index = 0; index < 60; index++) {
var tmp = index;
var task = Task.Run(() => {
ThreadPool.GetAvailableThreads(out int wt, out _);
ThreadPool.GetMaxThreads(out int mt, out _);
Console.WriteLine($"Started: {tmp} on thread {Thread.CurrentThread.ManagedThreadId}. Threads in pool: {mt - wt}");
var res = DoStuff(tmp).Result;
Console.WriteLine($"Done {res} on thread {Thread.CurrentThread.ManagedThreadId}");
});
tasks.Add(task);
}
await Task.WhenAll(tasks.ToArray());
sw.Stop();
Console.WriteLine($"Process took: {sw.Elapsed.Seconds} sec {sw.Elapsed.Milliseconds} ms");
}
public static async Task<string> DoStuff(int i) {
await Task.Delay(1000); // web request
Console.WriteLine($"continuation of {i} on thread {Thread.CurrentThread.ManagedThreadId}"); // continuation
return i.ToString();
}
}