我有一个带有linq查询结果的foreach循环。我试图让它运行得更快(运行大约需要一个小时)但是当我转换为Parallel.foreach时,我获得的结果与我使用标准foreach运行时的结果不同,即使它将时间缩短了一半。那些在linq和parallelism方面做得更好的人可以帮助我解决这个问题。
真的想要一些方法加快速度。虽然为什么parallel.foreach没有给我相同的结果,我有点困惑。也许比我更聪明的人可以填补我。
标准的foreach:
var studentTestGroup = from st in this
group st by new { st.TestName, st.STI }
into studentGroups
select new { TestName = studentGroups.Key.TestName, STI = studentGroups.Key.STI, students = studentGroups };
//Loop through each group that has more than one test, or where there exists any retests at all.
foreach (var studentGroup in studentTestGroup.Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")))
{
if (studentGroup.students.Any(t => t.Retest == "Y") && studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "") == 1)
{
//For a test name and STI, if there exists a restest and only 1 non-retest, keep highest and disacrd the rest
var studentToKeep = studentGroup.students.OrderByDescending(t => t.TestScaledScore).First();
this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && t.PrimaryKey != studentToKeep.PrimaryKey);
}
else if (studentGroup.students.Any(t => t.Retest == "Y") && studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "") > 1)
{
//For a test anem and STI, if there exists a restest and more than 1 non-retest,
//then keep the highest (number of non-retests) scores and discard the rest
int numRetests = studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "");
var studentsToKeep = studentGroup.students.OrderByDescending(t => t.TestScaledScore).Take(numRetests);
this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && !studentsToKeep.Any(x => x.PrimaryKey == t.PrimaryKey));
}
else if (studentGroup.students.Any(t => t.Retest == "Y") && !studentGroup.students.Any(t => t.Retest == "N" || t.Retest == ""))
{
this.RemoveAll(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && Convert.ToInt32(t.TestScaledScore) < 400);
}
}
我转换为平行foreach的部分:
Parallel.ForEach (studentTestGroup.AsParallel().Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")).AsParallel(), studentGroup =>
{
答案 0 :(得分:3)
不要将PLinq(AsParallel)和TPL(Parallel.ForEach)结合起来。这甚至会降低速度,因为你重载线程池。使用其中一种技术。从并行性中获得的所有功能都可以加快CPU核心数量。之后,您可以使用一些分析器。关于只有你知道的收藏品的启发式方法。 对于您提供的代码 - 请勿重复说明!例如:
studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "")
在所有不同的条件下,您只能计算一次,而不是每次都计算一次。 同样如下:
studentGroup.students.Any(t => t.Retest == "Y")
&#34;任何&#34;将遍历整个集合直到谓词匹配,所以不要为具有相同条件的所有if语句多次迭代大集合! 问自己关于收藏,也许你可以使用词典搜索项目或其他一些结构,但正如我所说,这更像是你的收藏品的启发式,可能提供一些加速。 希望这可以帮助。 如果你想要更多,那么你需要分析器。
答案 1 :(得分:1)
什么类型是this
?
我认为this.RemoveAll()
存在竞争条件。如果同时修改多个线程中的列表/集合,则对集合的操作结果不明确。
在这种情况下,您可以在RemoveAll()
- 调用周围使用锁定语句,但是并行foreach的好处将会消失。
另一种可能性是记住所有应该被移除的物品并在foreach之后移除它们。我认为应该可以在多个线程上对集合进行添加操作。
编辑: 这可能是删除指定项目的更快实现:
List itemsToRemove = new List();
foreach (var studentGroup in studentTestGroup.Where(t => t.students.Count() > 1 || t.students.Any(x => x.Retest == "Y")))
{
int countNo = studentGroup.students.Count(t => t.Retest == "N" || t.Retest == "");
bool anyYes = studentGroup.students.Any(t => t.Retest == "Y");
if (anyYes && countNo == 1)
{
var studentToKeep = studentGroup.students.Single(t => t.Retest == "N" || t.Retest == "");
itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && t.PrimaryKey != studentToKeep.PrimaryKey);
}
else if (anyYes && countNo > 1)
{
var studentsToKeep = studentGroup.students.Where(t => t.Retest == "N" || t.Retest == "");
itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && !studentsToKeep.Any(x => x.PrimaryKey == t.PrimaryKey));
}
else if (anyYes && countNo == 0)
{
itemsToRemove.AddRange(t => t.STI == studentGroup.STI && t.TestName == studentGroup.TestName && Convert.ToInt32(t.TestScaledScore) < 400);
}
}
foreach (var itemToRemove in itemsToRemove)
{
this.Remove(itemToRemove);
}