由RunningWaitCallback()杀死的长时间运行的线程工作

时间:2014-07-28 06:59:01

标签: c# .net multithreading service threadpool

我得到了“托管”其他程序集的服务,让他们处理任务。所以这里是代码摘录:

public void Start()
{
    Log.Instance.Info("Trying runtime to start");
    // Loading all modules
    _workerLoader = new WorkerLoader();
    Log.Instance.Info("Trying to load workers");
    _workerLoader.CreateInstances();
    _tokenSource = new CancellationTokenSource();

    foreach (var worker in _workerLoader.Modules)
    {
        Log.Instance.Info("Adding {0} to global scope", worker.Id);
        var currentWorker = worker;
        _tasks.Add(Task.Factory.StartNew(() => BaseWork(currentWorker, _tokenSource.Token), _tokenSource.Token));
        Thread.Sleep(3000);
    }
    Log.Instance.Info("Runtime started");
}

private void BaseWork(IWorker worker, CancellationToken token)
{
    using (worker)
    {
        worker.WorkerStopped += (sender, args) =>
        {
            var stackTrace = new StackTrace();
            var stackFrames = stackTrace.GetFrames();
            var strFrames = "";
            if (stackFrames != null)
            {
                strFrames =
                    String.Join(Environment.NewLine, stackFrames.Select(
                        x =>
                            String.Format("{0} {1} ({2}:{3})", x.GetMethod(), x.GetFileName(), x.GetFileLineNumber(), x.GetFileColumnNumber())));
            }

            Log.Instance.Info("[{0}] Worker stopped. ({1})", worker.Id, strFrames);
        };
        worker.TaskStarted += (sender, info) => Log.Instance.Info("[{0}] Started: {1}", ((IWorker)sender).Id, info.id);
        worker.TaskFinished += (sender, info) => Log.Instance.Info("[{0}] Finished: {1}", ((IWorker)sender).Id, info.id);
        worker.ErrorOccurred += (sender, exception) => Log.Instance.Error("[{0}] Error: {1}", ((IWorker)sender).Id, exception);
        while (true)
        {
            if (token.IsCancellationRequested)
                token.ThrowIfCancellationRequested();

            worker.ProcessOnce();
        }
    }
}

worker.ProcessOnce()工作人员进行所有必要的操作,如连接到远程网站,从数据库获取数据,写入数据库等等。目前只有一名工人。

经过一些不那么棘手的解释后,问题就出现了。

问题是,经过一段时间的正常工作后,工作人员停止,发出关于日志文件的输入。它随机发生。我抓住了堆栈跟踪,注入了一些你可以在“worker stopped”事件处理程序中看到的代码,然后你就去了:

Worker stopped. (Void <BaseWork>b__3(System.Object, System.EventArgs)  (0:0)
Void OnWorkerStopped()  (0:0)
Void Dispose()  (0:0)
Void System.IDisposable.Dispose()  (0:0)
Void BaseWork(YellowPages.Contracts.IWorker, System.Threading.CancellationToken)  (0:0)
Void <Start>b__0()  (0:0)
Void InnerInvoke()  (0:0)
Void Execute()  (0:0)
Void ExecutionContextCallback(System.Object)  (0:0)
Void RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)  (0:0)
Void Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)  (0:0)
Void ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)  (0:0)
Boolean ExecuteEntry(Boolean)  (0:0)
Void System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()  (0:0)
Boolean Dispatch()  (0:0)
Boolean PerformWaitCallback()  (0:0))

我做错了吗?什么是PerformWaitCallback?似乎我正在准备糟糕的游泳池任务。

有什么想法吗?

提前致谢!

2 个答案:

答案 0 :(得分:1)

BaseWork下面的整个堆栈跟踪部分只是排队到Task的{​​{1}}代理。如果您只是简单地写了ThreadPool,那么您可能会获得更详细的列表,例如:

Log.Instance.Info(new StackTrace().ToString())

您的实际问题(最有可能)是// this is the anonymous handler delegate where you print the stack trace <BaseWork>b__3(System.Object, System.EventArgs) // you worker fires the WorkerStopped inside the Dispose method SomeClass.OnWorkerStopped() // this is where your worker was diposed YellowPages.Contracts.Worker.Dispose() // this is when method was invoked SomeClass.BaseWork(YellowPages.Contracts.IWorker, System.Threading.CancellationToken) // everything below this line is just your method being queued to the thread pool // and is irrelevant mscorlib.dll!System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke() mscorlib.dll!System.Threading.Tasks.Task.Execute() mscorlib.dll!System.Threading.Tasks.Task.ExecutionContextCallback(object obj = {unknown}) mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext = {unknown}, System.Threading.ContextCallback callback = {unknown}, object state = {unknown}, bool preserveSyncCtx = {unknown}) mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext = {unknown}, System.Threading.ContextCallback callback = {unknown}, object state = {unknown}, bool preserveSyncCtx = {unknown}) mscorlib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot = {unknown}) mscorlib.dll!System.Threading.Tasks.Task.ExecuteEntry(bool bPreventDoubleExecution = {unknown}) mscorlib.dll!System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() mscorlib.dll!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() 引发异常:

  1. worker.ProcessOnce()抛出一个异常并且它没有捕获它。
  2. worker.ProcessOnce()中的using block确保在其隐式BaseWork子句中调用worker.Dispose()
  3. finally触发记录堆栈跟踪的worker.Dispose()事件。
  4. 由于没有WorkerStopped子句来捕获异常,catch会默默地吞下它(除非您使用的是早于4.5的.NET或者the ThrowUnobservedTaskExceptions option enabled )。
  5. 如果你在某处调用Task,它会阻塞该线程,然后在工作者抛出它时抛出Task.Wait()。由于你可能不会在任何地方等待,你可以创建一个延续,在异常的情况下会触发,即:

    AggregateException

    或者,您可以在_tasks.Add(Task.Factory // create the task .StartNew(() => BaseWork(currentWorker, _tokenSource.Token), _tokenSource.Token) // create the exception-only continuation .ContinueWith(t => Log.Instance.Error(t.Exception.Flatten().ToString()), TaskContinuationOptions.OnlyOnFaulted) ); 的调用周围添加try/catch块并在那里记录异常(但上面显示的方法将捕获worker.ProcessOnce()内引发的任何异常,因此它& #39;更安全使用它。)

    添加此项将允许您记录实际的异常。如果由于代码中的错误而发生,请修复它。否则(如果这是一个可以预期的异常,比如套接字异常),为该异常添加一个BaseWork处理程序并吞下它。

    此外,还不清楚catch事件应该发出什么信号(可能不是例外)?

答案 1 :(得分:0)

只需使用try catch块包围worker.ProcessOnce();,并确保它不会抛出任何异常。 如果您发现情况并非如此,我真的怀疑您的日志系统崩溃Log.Instance.Info。仔细检查您的日志记录类代码并考虑使用锁定,以避免从多个线程同时访问同一个变量。

lock(myobject)
{
   .....
}