在C#中优化foreach循环,添加线程?

时间:2015-05-04 07:54:13

标签: c# wpf multithreading math foreach

所以已经制作了一个邮件过滤程序,它只能在一个"测试"环境。但是当我想在一个真正的数据库集上尝试它时,等了一个小时,可能还需要等待10个小时才能获得结果。

这是我的循环:

foreach (var word in mail)
            {               
                foreach (var wordInSpam in countsWordOccurenceSpam)
                {
                    foreach (var wordInOk in countsWordOccurenceOk)
                    {
                        if (countsWordOccurenceOk.ContainsKey(word.Key) && countsWordOccurenceSpam.ContainsKey(word.Key))
                        {
                            if (word.Key == wordInOk.Key && word.Key == wordInSpam.Key)
                            {
                             //math
                            }
                        }
                        else if (countsWordOccurenceOk.ContainsKey(word.Key) && (!countsWordOccurenceSpam.ContainsKey(word.Key)))
                        {
                            if (word.Key == wordInOk.Key)
                            {
                             //math
                            }
                        }
                        else if (countsWordOccurenceSpam.ContainsKey(word.Key) && (!countsWordOccurenceOk.ContainsKey(word.Key)))
                        {
                            if (word.Key == wordInSpam.Key)
                            {
                            //math
                            }
                        }
                        else
                        {
                            //math
                        }
                    }
                }
            }

邮件是邮件的字典"要检查",其中包含单词和计数器,countsWordOccurenceSpam / Ok是多个邮件的字典,包含单词及其计数器。

看起来像这样:

   if (openFileDialog.ShowDialog() == true)
    {
        foreach (string filename in openFileDialog.FileNames)
        {
            myOkMail.Add(filename);

        }
    }

    string[] okFiles = myOkMail.ToArray();


    var logFile2 = okFiles
        .SelectMany(i => System.IO.File.ReadAllLines(i)).ToList();

     countsWordOccurenceOk = okFiles
        .SelectMany(i => System.IO.File.ReadAllLines(i)
        .SelectMany(line => line.Split(new[] { ' ', ',', '.', '?', '!', '.' }, StringSplitOptions.RemoveEmptyEntries))
        .Distinct())
        .GroupBy(word => word)
        .ToDictionary(g => g.Key, g => g.Count());

当我测试50个邮件时,程序运行得很完美,但是当有50k垃圾邮件和5万个火腿邮件时......就没有了。使用的处理器仅在10%左右。

此外,值得注意的是,"数学"部分在每个检查类别中几乎相同,看起来像这样:

                else if (countsWordOccurenceSpam.ContainsKey(word.Key) && (!countsWordOccurenceOk.ContainsKey(word.Key)))
                {
                    if (word.Key == wordInSpam.Key)
                    {
                        totals = wordInSpam.Value;

                        fprob_spam = ((double)wordInSpam.Value) / ile_spam;

                        sum_spam = (((weight * probability) + (totals * fprob_spam)) / (totals + weight));
                        sum_ok = ((weight * probability) / (totals + weight)); 

                        sum_spam = Math.Pow(sum_spam, word.Value);
                        sum_ok = Math.Pow(sum_ok, word.Value);

                        cos = countsWordOccurenceOk.Count;
                        wp_spam = Math.Pow(sum_spam, (1/cos));
                        last_o = Math.Pow(sum_ok, (1 / cos));

                        wp_spam_1 = wp_spam_1 * wp_spam;
                        last_o_1 = last_o_1 * last_o;

                    }
                }

是的,看起来很糟糕。而且,我仍然没有做的一件事是我必须用来获得正确的结果:

                        cos = countsWordOccurenceOk.Count;
                        wp_spam = Math.Pow(sum_spam, (1/cos));
                        last_o = Math.Pow(sum_ok, (1 / cos));

因为它乘以数据库中的单词数。

帮助表示感谢, 健一

1 个答案:

答案 0 :(得分:0)

您可以尝试的一种简单方法是使用Parallel.ForEachMSDN),它可以在不同的线程中运行循环的迭代。

您可以尝试更换外部ForEach,看看是否发现任何性能差异。看起来应该是这样的:

Parallel.ForEach(mail, this.DoWork);

然后你可以用你的DoWork方法调用你的下一个循环:

public void DoWork(String word)
{
    foreach (var wordInSpam in countsWordOccurenceSpam)
    {
       ...
    }
}