并行化许多HTTP Web请求的好方法是什么?

时间:2012-10-23 08:54:02

标签: c# multithreading .net-4.0

我正在构建一个通用的URI检索系统。基本上有一个泛型类Retriever<T>,它维护一个要检索的URI队列。它有一个单独的线程,可以尽可能快地处理该队列。如问题标题所示,URI类型的一个示例是HTTP类型URI。

问题是,当我开始通过抽象方法T RetrieveResource(Uri location)请求检索资源时,由于缺乏异步性,它会变慢。

我首先想到的是将RetrieveResource的返回类型更改为Task<T>。但是,当我们有数千个未完成的任务时,这似乎会使任务堆积起来并导致许多问题。它似乎创建了许多实际线程而不是利用线程池。我想这只会让一切都变慢,因为有太多的事情在同一时间发生,所以没有任何事情可以取得重大进展。

预计我们将检索大量排队的项目,并且无法像入队那样快速处理这些项目。随着时间的推移,系统有机会迎头赶上;但它肯定不会很快。

我还考虑过而不是维护一个队列和一个线程来处理它......只是在ThreadPool上排队工作项。但是,我不确定这是否理想,如果我需要在处理所有工作项之前关闭系统,或者稍后想要允许优先级等等。

我们也知道检索资源是一个耗时的过程(0.250 - 5秒),但不一定是资源紧张的过程。我们很好地将其与数百个请求并行化。

我们的要求是:

  • 即使系统正在处理队列,
  • 也可以从任何线程入队。
  • 以后检索可能需要具有优先级
  • 应该可以暂停检索
  • 当没有检索到任何内容时,应该进行最小的旋转(BlockingCollection在这里很有用)。

有没有一种很好的方法来并行化这种方法而不会引入不必要的复杂性?

以下是我们现有的一些代码,作为示例。

public abstract class Retriever<T> : IRetriever<T>, IDisposable
{
    private readonly Thread worker;
    private readonly BlockingCollection<Uri> pending;
    private volatile int isStarted;
    private volatile int isDisposing;

    public event EventHandler<RetrievalEventArgs<T>> Retrieved;

    protected Retriever()
    {
        this.worker = new Thread(this.RetrieveResources);
        this.pending = new BlockingCollection<Uri>(new ConcurrentQueue<Uri>());
        this.isStarted = 0;
        this.isDisposing = 0;
    }

    ~Retriever()
    {
        this.Dispose(false);
    }

    private void RetrieveResources()
    {
        while (this.isDisposing == 0)
        {
            while (this.isStarted == 0)
            {
                Monitor.Wait(this.pending);
            }

            Uri location = this.pending.Take();

            // This is what needs to be concurrently done.
            // In this example, it's synchronous, but just on a separate thread.
            T result = this.RetrieveResource(location);

            // At this point, we would fire our event with the retrieved data
        }
    }

    protected abstract T RetrieveResource(Uri location);

    protected void Dispose(bool disposing)
    {
        if (Interlocked.CompareExchange(ref this.isDisposing, 1, 0) == 1)
        {
            return;
        }

        if (disposing)
        {
            this.pending.CompleteAdding();
            this.worker.Join();
        }
    }

    public void Add(Uri uri)
    {
        try
        {
            this.pending.Add(uri);
        }
        catch (InvalidOperationException)
        {
            return;
        }
    }

    public void AddRange(IEnumerable<Uri> uris)
    {
        foreach (Uri uri in uris)
        {
            try
            {
                this.pending.Add(uri);
            }
            catch (InvalidOperationException)
            {
                return;
            }
        }
    }

    public void Start()
    {
        if (Interlocked.CompareExchange(ref this.isStarted, 1, 0) == 1)
        {
            throw new InvalidOperationException("The retriever is already started.");
        }

        if (this.worker.ThreadState == ThreadState.Unstarted)
        {
            this.worker.Start();
        }

        Monitor.Pulse(this.pending);
    }

    public void Stop()
    {
        if (Interlocked.CompareExchange(ref this.isStarted, 0, 1) == 0)
        {
            throw new InvalidOperationException("The retriever is already stopped.");
        }
    }

    public void Dispose()
    {
        this.Dispose(true);
        GC.SuppressFinalize(this);
    }
}

基于上面的例子...我认为这个解决方案增加了太多的复杂性,或者说奇怪的代码......就是这样。

    private void RetrieveResources()
    {
        while (this.isDisposing == 0)
        {
            while (this.isStarted == 0)
            {
                Monitor.Wait(this.pending);
            }

            Uri location = this.pending.Take();

            Task<T> task = new Task<T>((state) =>
                {
                    return this.RetrieveResource(state as Uri);
                }, location);

            task.ContinueWith((t) =>
                {
                    T result = t.Result;
                    RetrievalEventArgs<T> args = new RetrievalEventArgs<T>(location, result);

                    EventHandler<RetrievalEventArgs<T>> callback = this.Retrieved;
                    if (!Object.ReferenceEquals(callback, null))
                    {
                        callback(this, args);
                    }
                });

            task.Start();
        }
    }

1 个答案:

答案 0 :(得分:2)

我认为我提出了一个非常好的解决方案。我抽象了检索资源的方法和结果的表示。这允许支持使用任意结果检索任意URI;有点像一些URI驱动的“ORM”。

它支持变量并发级别。有一天,当我发布这个问题时,我忘记了异步和并发性是完全不同的,而且我在任务中实现的只是异步并且干扰了任务调度程序,因为我真正想要的是并发性。

我添加了取消,因为启动/停止功能似乎是一个好主意。

public abstract class Retriever<T> : IRetriever<T>
{
    private readonly object locker;
    private readonly BlockingCollection<Uri> pending;
    private readonly Thread[] threads;
    private CancellationTokenSource cancellation;

    private volatile int isStarted;
    private volatile int isDisposing;

    public event EventHandler<RetrieverEventArgs<T>> Retrieved;

    protected Retriever(int concurrency)
    {
        if (concurrency <= 0)
        {
            throw new ArgumentOutOfRangeException("concurrency", "The specified concurrency level must be greater than zero.");
        }

        this.locker = new object();
        this.pending = new BlockingCollection<Uri>(new ConcurrentQueue<Uri>());
        this.threads = new Thread[concurrency];
        this.cancellation = new CancellationTokenSource();

        this.isStarted = 0;
        this.isDisposing = 0;

        this.InitializeThreads();
    }

    ~Retriever()
    {
        this.Dispose(false);
    }

    private void InitializeThreads()
    {
        for (int i = 0; i < this.threads.Length; i++)
        {
            Thread thread = new Thread(this.ProcessQueue)
            {
                IsBackground = true
            };

            this.threads[i] = thread;
        }
    }

    private void StartThreads()
    {
        foreach (Thread thread in this.threads)
        {
            if (thread.ThreadState == ThreadState.Unstarted)
            {
                thread.Start();
            }
        }
    }

    private void CancelOperations(bool reset)
    {
        this.cancellation.Cancel();
        this.cancellation.Dispose();

        if (reset)
        {
            this.cancellation = new CancellationTokenSource();
        }
    }

    private void WaitForThreadsToExit()
    {
        foreach (Thread thread in this.threads)
        {
            thread.Join();
        }
    }

    private void ProcessQueue()
    {
        while (this.isDisposing == 0)
        {
            while (this.isStarted == 0)
            {
                Monitor.Wait(this.locker);
            }

            Uri location;

            try
            {
                location = this.pending.Take(this.cancellation.Token);
            }
            catch (OperationCanceledException)
            {
                continue;
            }

            T data;

            try
            {
                data = this.Retrieve(location, this.cancellation.Token);
            }
            catch (OperationCanceledException)
            {
                continue;
            }

            RetrieverEventArgs<T> args = new RetrieverEventArgs<T>(location, data);

            EventHandler<RetrieverEventArgs<T>> callback = this.Retrieved;
            if (!Object.ReferenceEquals(callback, null))
            {
                callback(this, args);
            }
        }
    }

    private void ThowIfDisposed()
    {
        if (this.isDisposing == 1)
        {
            throw new ObjectDisposedException("Retriever");
        }
    }

    protected abstract T Retrieve(Uri location, CancellationToken token);

    protected virtual void Dispose(bool disposing)
    {
        if (Interlocked.CompareExchange(ref this.isDisposing, 1, 0) == 1)
        {
            return;
        }

        if (disposing)
        {
            this.CancelOperations(false);
            this.WaitForThreadsToExit();
            this.pending.Dispose();
        }
    }

    public void Start()
    {
        this.ThowIfDisposed();

        if (Interlocked.CompareExchange(ref this.isStarted, 1, 0) == 1)
        {
            throw new InvalidOperationException("The retriever is already started.");
        }

        Monitor.PulseAll(this.locker);
        this.StartThreads();
    }

    public void Add(Uri location)
    {
        this.pending.Add(location);
    }

    public void Stop()
    {
        this.ThowIfDisposed();

        if (Interlocked.CompareExchange(ref this.isStarted, 0, 1) == 0)
        {
            throw new InvalidOperationException("The retriever is already stopped.");
        }

        this.CancelOperations(true);
    }

    public void Dispose()
    {
        this.Dispose(true);
        GC.SuppressFinalize(this);
    }
}