Question

我正在尝试制作另一个网络蜘蛛。为此，我决定选择任务。我创造了很少的概念证明。它有效，但我认为，它有点慢。

class Program
{
    static void Main(string[] args)
    {
        InitializeUrls();

        Start();


        Console.ReadKey();
    }

    private static void InitializeUrls()
    {
        _random = new Random();

        List<int> numbers = Enumerable.Range(0, 100).ToList();
        foreach (int number in numbers)
            _urls.Add(number.ToString());
    }

    private static readonly BlockingCollection<string> _urls = new BlockingCollection<string>();

    private static readonly TaskFactory _factory = new TaskFactory();

    private static CancellationTokenSource _tokenSource;

    private static Task _task;

    private static Random _random;

    public static void Start()
    {
        _tokenSource = new CancellationTokenSource();
        _task = _factory.StartNew(
            () =>
            {
                try
                {
                    Parallel.ForEach(
                        _urls.GetConsumingEnumerable(),
                        new ParallelOptions
                        {
                            MaxDegreeOfParallelism = 100, //number of threads running parallel
                            CancellationToken = _tokenSource.Token
                        },
                        (url, loopState) =>
                        {
                            if (!_tokenSource.IsCancellationRequested)
                            {
                                //here is the action
                                int waitTime = 5;// _random.Next(0, 15);

                                Console.WriteLine(string.Format("url {0}\ttime {1}\tthreadID {2}", url, waitTime,Thread.CurrentThread.ManagedThreadId));
                                Thread.Sleep(waitTime * 1000);
                            }
                            else
                            {
                                //stop

                                loopState.Stop();
                            }
                        });
                }
                catch (OperationCanceledException exception)
                {
                    Console.WriteLine("Error when ending the operation", exception.ToString());
                }
                catch (Exception exception)
                {
                    Console.WriteLine("General exception", exception);
                }
            },
            _tokenSource.Token);
    }
}

如您所见，我可以设置一次运行的线程数。当我将它设置为1时，它运行良好，它将URL写入控制台并等待五秒钟。当我将它设置为100时，我希望它立即创建一百个任务，但是如果你运行它，它就不会。它的网址非常慢。你知道为什么会这样吗？

Answer 1

当我将它设置为100时，我希望它可以创建一百个任务 immediatelly

那是你的错。你设置的变量没有被称为＆＃34; DegreeOfParallelism＆＃34;它被称为＆＃34; Max DegreeOfParallelism＆＃34;。 Parallel.ForEach将从少量任务开始，然后在工作完成时开始增加到您定义的最大值。

我非常建议您阅读免费的电子书＆＃34; Patterns of Parallel Programming＆＃34;由微软。它涉及Parallel.ForEach等行为。

如果你想立即获得100个线程，你只需要使用普通的ForEach并自行排队。您需要某种速率限制器来限制最大并行度。

var degreeOfParallelism = new Semaphore(100, 100);

foreach(var loopUrl in _urls.GetConsumingEnumerable())
{
    //If you are on C# 5 this line is not necessary.
    var url = loopUrl;

    if (_tokenSource.IsCancellationRequested)
    {
        //Stop
        break;
    }

    //Takes one slot up in the pool of 100.
    degreeOfParallelism.WaitOne();

    ThreadPool.QueueUserWorkItem((state) =>
    {
        try
        {    
            //here is the action
            int waitTime = 5;// _random.Next(0, 15);

            Console.WriteLine(string.Format("url {0}\ttime {1}\tthreadID {2}", url, waitTime,Thread.CurrentThread.ManagedThreadId));
            Thread.Sleep(waitTime * 1000);
        }
        finally
        {
            //Release a item back to the pool.
            degreeOfParallelism.Release();
        }            
    });
}

但是，如果您正在使用Web爬虫并使用.NET 4.5，则根本不需要使用线程。而是使用函数的XxxxxAsync()版本，您可以保留100个任务的列表，并执行Task.WhenAny(yourTaskList)以检测何时完成。

Answer 2

如果你的工作是CPU限制的，那么添加比核心更多的线程将无济于事。

如果你的工作不受CPU限制（例如，休眠），你应该使用async（await Task.WhenAll(stuff.Select(async s => await ...))）代替，所以你不需要任何线程。

并行ForEach太慢了

2 个答案: