使用linq的并行性

时间:2015-02-27 16:29:30

标签: c# linq foreach parallel-processing parallel.foreach

我有一个带有linq查询结果的foreach循环。我试图让它运行得更快(运行大约需要一个小时)但是当我转换为Parallel.foreach时,我获得的结果与我使用标准foreach运行时的结果不同,即使它将时间缩短了一半。那些在linq和parallelism方面做得更好的人可以帮助我解决这个问题。

真的想要一些方法加快速度。虽然为什么parallel.foreach没有给我相同的结果,我有点困惑。也许比我更聪明的人可以填补我。

标准的foreach:

var studentTestGroup = from st in this
                                   group st by new { st.TestName, st.STI }
                                       into studentGroups
                                       select new { TestName = studentGroups.Key.TestName, STI = studentGroups.Key.STI, students = studentGroups };
            //Loop through each group that has more than one test, or where there exists any retests at all.
            foreach (var studentGroup in studentTestGroup.Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")))
            {
                if (studentGroup.students.Any(t => t.Retest == "Y") && studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "") == 1)
                {
                    //For a test name and STI, if there exists a restest and only 1 non-retest, keep highest and disacrd the rest
                    var studentToKeep = studentGroup.students.OrderByDescending(t => t.TestScaledScore).First();

                        this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && t.PrimaryKey != studentToKeep.PrimaryKey);
                }
                else if (studentGroup.students.Any(t => t.Retest == "Y") && studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "") > 1)
                {
                    //For a test anem and STI, if there exists a restest and more than 1 non-retest, 
                    //then keep the highest (number of non-retests) scores and discard the rest
                    int numRetests = studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "");
                    var studentsToKeep = studentGroup.students.OrderByDescending(t => t.TestScaledScore).Take(numRetests);

                        this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && !studentsToKeep.Any(x => x.PrimaryKey == t.PrimaryKey));
                }
                else if (studentGroup.students.Any(t => t.Retest == "Y") && !studentGroup.students.Any(t => t.Retest == "N" || t.Retest == ""))
                {
                        this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && Convert.ToInt32(t.TestScaledScore) < 400);
                }
            }

我转换为平行foreach的部分:

Parallel.ForEach (studentTestGroup.AsParallel().Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")).AsParallel(), studentGroup =>
            {

2 个答案:

答案 0 :(得分:3)

不要将PLinq(AsParallel)和TPL(Parallel.ForEach)结合起来。这甚至会降低速度,因为你重载线程池。使用其中一种技术。从并行性中获得的所有功能都可以加快CPU核心数量。之后,您可以使用一些分析器。关于只有你知道的收藏品的启发式方法。 对于您提供的代码 - 请勿重复说明!例如:

studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "")

在所有不同的条件下,您只能计算一次,而不是每次都计算一次。 同样如下:

studentGroup.students.Any(t => t.Retest == "Y")

&#34;任何&#34;将遍历整个集合直到谓词匹配,所以不要为具有相同条件的所有if语句多次迭代大集合! 问自己关于收藏,也许你可以使用词典搜索项目或其他一些结构,但正如我所说,这更像是你的收藏品的启发式,可能提供一些加速。 希望这可以帮助。 如果你想要更多,那么你需要分析器。

答案 1 :(得分:1)

什么类型是this

我认为this.RemoveAll()存在竞争条件。如果同时修改多个线程中的列表/集合,则对集合的操作结果不明确。 在这种情况下,您可以在RemoveAll() - 调用周围使用锁定语句,但是并行foreach的好处将会消失。

另一种可能性是记住所有应该被移除的物品并在foreach之后移除它们。我认为应该可以在多个线程上对集合进行添加操作。

编辑: 这可能是删除指定项目的更快实现:

List itemsToRemove = new List();
foreach (var studentGroup in studentTestGroup.Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")))
{
    int countNo = studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "");
    bool anyYes = studentGroup.students.Any(t => t.Retest == "Y");
    if (anyYes && countNo == 1)
    {
        var studentToKeep = studentGroup.students.Single(t => t.Retest == "N" || t.Retest == "");

        itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && t.PrimaryKey != studentToKeep.PrimaryKey);
    }
    else if (anyYes && countNo > 1)
    {
        var studentsToKeep = studentGroup.students.Where(t => t.Retest == "N" || t.Retest == "");

        itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && !studentsToKeep.Any(x => x.PrimaryKey == t.PrimaryKey));
    }
    else if (anyYes && countNo == 0)
    {
        itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && Convert.ToInt32(t.TestScaledScore) < 400);
    }
}
foreach (var itemToRemove in itemsToRemove)
{
    this.Remove(itemToRemove);
}