Question

我有一个带有linq查询结果的foreach循环。我试图让它运行得更快（运行大约需要一个小时）但是当我转换为Parallel.foreach时，我获得的结果与我使用标准foreach运行时的结果不同，即使它将时间缩短了一半。那些在linq和parallelism方面做得更好的人可以帮助我解决这个问题。

真的想要一些方法加快速度。虽然为什么parallel.foreach没有给我相同的结果，我有点困惑。也许比我更聪明的人可以填补我。

标准的foreach：

var studentTestGroup = from st in this
                                   group st by new { st.TestName, st.STI }
                                       into studentGroups
                                       select new { TestName = studentGroups.Key.TestName, STI = studentGroups.Key.STI, students = studentGroups };
            //Loop through each group that has more than one test, or where there exists any retests at all.
            foreach (var studentGroup in studentTestGroup.Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")))
            {
                if (studentGroup.students.Any(t => t.Retest == "Y") && studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "") == 1)
                {
                    //For a test name and STI, if there exists a restest and only 1 non-retest, keep highest and disacrd the rest
                    var studentToKeep = studentGroup.students.OrderByDescending(t => t.TestScaledScore).First();

                        this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && t.PrimaryKey != studentToKeep.PrimaryKey);
                }
                else if (studentGroup.students.Any(t => t.Retest == "Y") && studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "") > 1)
                {
                    //For a test anem and STI, if there exists a restest and more than 1 non-retest, 
                    //then keep the highest (number of non-retests) scores and discard the rest
                    int numRetests = studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "");
                    var studentsToKeep = studentGroup.students.OrderByDescending(t => t.TestScaledScore).Take(numRetests);

                        this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && !studentsToKeep.Any(x => x.PrimaryKey == t.PrimaryKey));
                }
                else if (studentGroup.students.Any(t => t.Retest == "Y") && !studentGroup.students.Any(t => t.Retest == "N" || t.Retest == ""))
                {
                        this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && Convert.ToInt32(t.TestScaledScore) < 400);
                }
            }

我转换为平行foreach的部分：

Parallel.ForEach (studentTestGroup.AsParallel().Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")).AsParallel(), studentGroup =>
            {

Answer 1

不要将PLinq（AsParallel）和TPL（Parallel.ForEach）结合起来。这甚至会降低速度，因为你重载线程池。使用其中一种技术。从并行性中获得的所有功能都可以加快CPU核心数量。之后，您可以使用一些分析器。关于只有你知道的收藏品的启发式方法。对于您提供的代码 - 请勿重复说明！例如：

studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "")

在所有不同的条件下，您只能计算一次，而不是每次都计算一次。同样如下：

studentGroup.students.Any(t => t.Retest == "Y")

＆＃34;任何＆＃34;将遍历整个集合直到谓词匹配，所以不要为具有相同条件的所有if语句多次迭代大集合！问自己关于收藏，也许你可以使用词典搜索项目或其他一些结构，但正如我所说，这更像是你的收藏品的启发式，可能提供一些加速。希望这可以帮助。如果你想要更多，那么你需要分析器。

Answer 2

什么类型是this？

我认为this.RemoveAll()存在竞争条件。如果同时修改多个线程中的列表/集合，则对集合的操作结果不明确。在这种情况下，您可以在RemoveAll() - 调用周围使用锁定语句，但是并行foreach的好处将会消失。

另一种可能性是记住所有应该被移除的物品并在foreach之后移除它们。我认为应该可以在多个线程上对集合进行添加操作。

编辑：这可能是删除指定项目的更快实现：

List itemsToRemove = new List();
foreach (var studentGroup in studentTestGroup.Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")))
{
    int countNo = studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "");
    bool anyYes = studentGroup.students.Any(t => t.Retest == "Y");
    if (anyYes && countNo == 1)
    {
        var studentToKeep = studentGroup.students.Single(t => t.Retest == "N" || t.Retest == "");

        itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && t.PrimaryKey != studentToKeep.PrimaryKey);
    }
    else if (anyYes && countNo > 1)
    {
        var studentsToKeep = studentGroup.students.Where(t => t.Retest == "N" || t.Retest == "");

        itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && !studentsToKeep.Any(x => x.PrimaryKey == t.PrimaryKey));
    }
    else if (anyYes && countNo == 0)
    {
        itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && Convert.ToInt32(t.TestScaledScore) < 400);
    }
}
foreach (var itemToRemove in itemsToRemove)
{
    this.Remove(itemToRemove);
}

使用linq的并行性

2 个答案: