在多线程中改善循环中的繁重工作

时间:2012-08-29 08:55:52

标签: c# multithreading task-parallel-library

我的数据处理有点问题。

public void ParseDetails()
{            
    for (int i = 0; i < mListAppInfo.Count; ++i)
    {
        ParseOneDetail(i);
    }
}

对于300条记录,通常需要大约13-15分钟。

我尝试使用Parallel.For()进行改进,但它总是在某个时候停止。

public void ParseDetails()
{
    Parallel.For(0, mListAppInfo.Count, i => ParseOneDetail(i));
}

在方法ParseOneDetail(int index)中,我设置了一个输出日志,用于跟踪正在处理的记录ID。 总是挂在某个地方,我不知道为什么......

ParseOneDetail(): 89 ...
ParseOneDetail(): 90 ...
ParseOneDetail(): 243 ...
ParseOneDetail(): 92 ...
ParseOneDetail(): 244 ...
ParseOneDetail(): 93 ...
ParseOneDetail(): 245 ...
ParseOneDetail(): 247 ...
ParseOneDetail(): 94 ...
ParseOneDetail(): 248 ...
ParseOneDetail(): 95 ...
ParseOneDetail(): 99 ...
ParseOneDetail(): 249 ...
ParseOneDetail(): 100 ...
_ <hang at this point>

感谢您的帮助和改进建议。 谢谢!

Edit 1:更新方法:

private void ParseOneDetail(int index)
{
    Console.WriteLine("ParseOneDetail(): " + index + " ... ");
    ApplicationInfo appInfo = mListAppInfo[index];

    var htmlWeb = new HtmlWeb();
    var document = htmlWeb.Load(appInfo.AppAnnieURL);

    // get first one only
    HtmlNode nodeStoreURL = document.DocumentNode.SelectSingleNode(Constants.XPATH_FIRST);
    appInfo.StoreURL = nodeStoreURL.Attributes[Constants.HREF].Value;
}

Edit 2:这是Enigmativity建议运行一段时间后的错误输出,

ParseOneDetail(): 234 ...
ParseOneDetail(): 87 ...
ParseOneDetail(): 235 ...
ParseOneDetail(): 236 ...
ParseOneDetail(): 88 ...
ParseOneDetail(): 238 ...
ParseOneDetail(): 89 ...
ParseOneDetail(): 90 ...
ParseOneDetail(): 239 ...
ParseOneDetail(): 92 ...

Unhandled Exception: System.AggregateException: One or more errors occurred. ---
> System.Net.WebException: The operation has timed out
   at System.Net.HttpWebRequest.GetResponse()
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocum
ent doc, IWebProxy proxy, ICredentials creds) in D:\Source\htmlagilitypack.new\T
runk\HtmlAgilityPack\HtmlWeb.cs:line 1355
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, Ne
tworkCredential creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\Ht
mlWeb.cs:line 1479
   at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in D:\Source\htmla
gilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1103
   at HtmlAgilityPack.HtmlWeb.Load(String url) in D:\Source\htmlagilitypack.new\
Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1061
   at SimpleChartParser.AppAnnieParser.ParseOneDetail(ApplicationInfo appInfo) i
n c:\users\nhn60\documents\visual studio 2010\Projects\FunToolPack\SimpleChartPa
rser\AppAnnieParser.cs:line 90
   at SimpleChartParser.AppAnnieParser.<ParseDetails>b__0(ApplicationInfo ai) in
 c:\users\nhn60\documents\visual studio 2010\Projects\FunToolPack\SimpleChartPar
ser\AppAnnieParser.cs:line 80
   at System.Threading.Tasks.Parallel.<>c__DisplayClass21`2.<ForEachWorker>b__17
(Int32 i)
   at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass7.<ExecuteSelfReplicating>b__
6(Object )
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceled
Exceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationTo
ken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int
32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWit
hState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](TSource[] ar
ray, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Act
ion`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEveryt
hing, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`
1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState
, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithE
verything, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Act
ion`1 body)
   at SimpleChartParser.AppAnnieParser.ParseDetails() in c:\users\nhn60\document
s\visual studio 2010\Projects\FunToolPack\SimpleChartParser\AppAnnieParser.cs:li
ne 80
   at SimpleChartParser.Program.Main(String[] args) in c:\users\nhn60\documents\
visual studio 2010\Projects\FunToolPack\SimpleChartParser\Program.cs:line 15

2 个答案:

答案 0 :(得分:0)

这本身不是一个答案,但您是否可以尝试修改代码以便像这样工作,如果您仍然有错误,请告诉我们?

public void ParseDetails()
{
    Parallel.ForEach(mListAppInfo.ToArray(), ai => ParseOneDetail(ai));
}

private void ParseOneDetail(ApplicationInfo appInfo)
{
    var htmlWeb = new HtmlWeb();
    var document = htmlWeb.Load(appInfo.AppAnnieURL);

    // get first one only
    HtmlNode nodeStoreURL = 
            document.DocumentNode.SelectSingleNode(Constants.XPATH_FIRST);
    appInfo.StoreURL = nodeStoreURL.Attributes[Constants.HREF].Value;
}

答案 1 :(得分:0)

我会从ParseOneDetail注释掉每一行并检查它是否完成了运行。

如果确实(它应该),取消注释前半部分,并检查它是否会再次完成。继续执行此二进制搜索,直到您确切地确定哪一行停止运行。该行中可能存在一些不是线程安全的东西。