Question

我有n个元素（例如A，B，C和D），需要在所有这些元素之间进行计算。

Calculation 1 = A with B  
Calculation 2 = A with C  
Calculation 3 = A with D  
Calculation 4 = B with C  
Calculation 5 = B with D  
Calculation 6 = C with D

实际上有超过1000个元素，我想要并行化这个过程。

请注意，我无法同时访问2个线程中的元素。例如，这使得无法同时执行计算1 和计算2 ，因为它们都使用 A 元素。编辑：我可以从2个线程访问一个元素但是如果我只是拆分计算并依赖于线程安全的锁定，它会使一切都非常慢。

是否已存在针对此类问题的分配算法？似乎很多人已经有过同样的问题，但我在伟大的互联网上找不到任何东西。 ;）

单线程示例代码：

for (int i = 0; i < elementCount; i++)
{
   for (int j = i + 1; j < elementCount; j++)
   {
      Calculate(element[i], element[j]);
   }
}

Answer 1

您可以应用round-robin tournament algorithm，以便组织所有可能的对（N*(N-1)结果）。

所有设置元素（玩家）形成两行，列是对本轮。第一个元素是固定的，其他元素是循环移位的。

因此，您可以运行最多N/2个线程来获取第一对集合的结果，然后重新排序索引并继续

摘自wiki：

The circle method is the standard algorithm to create a schedule for a round-robin tournament. All competitors are assigned to numbers, and then paired in the first round:
Round 1. (1 plays 14, 2 plays 13, ... )
 1  2  3  4  5  6  7
 14 13 12 11 10 9  8   

then fix one of the contributors in the first or last column of the table (number one in this example) and rotate the others clockwise one position
Round 2. (1 plays 13, 14 plays 12, ... )
 1  14 2  3  4  5  6
 13 12 11 10 9  8  7
Round 3. (1 plays 12, 13 plays 11, ... )
 1  13 14 2  3  4  5
 12 11 10 9  8  7  6   

until you end up almost back at the initial position
Round 13. (1 plays 2, 3 plays 14, ... )
 1  3  4  5  6  7  8
 2 14  13 12 11 10 9

Answer 2

很容易证明没有办法分配你的计算以便不会发生冲突（也就是说，除非你手动订购计算并放置圆边界，如@Mbo建议的那样），这意味着没有分布在允许你永远不会锁定的多个线程中。

证明：鉴于您的要求，任何涉及数据对象A的计算都应该发生在给定的线程T上（只有这样才能确保您永远不会锁定A）。
然后，线程T必须处理至少一对包含输入列表的每个其他对象（B，C，D）。从基本要求得出，T也是处理与对象B相关的一切。而C.和D.所以一切因此，只有T可以工作 QED。没有可能永远无法锁定的并行化。

围绕＃1：map / reduce

那说......这是分而治之的典型案例。你是对的，简单的添加可能需要关键部分锁定，而不执行执行顺序。那是因为你的关键操作（加法）有一个很好的属性，关联性：A +（B + C）=（A + B）+ C，在交换之上。

换句话说，此操作是（parralel友好的）减少操作的候选者。

所以这里的关键可能是：

发出所有有趣的对子流
将每对映射到一个或多个部分结果
按主要对象（A，B，C）
通过组合部分结果来减少每个组

示例（伪）代码

static class Data { int i = 0; }
static class Pair { Data d1; Data d2; }
static class PartialComputation { Data d; int sum; }

Data[] data = ...
Stream<Pair> allPairs = ... // Something like IntStream(0, data.length-1).flatMap(idx -> IntStream(idx+1 to data.length ).map(idx2 -> new Pair(data[idx], data[idx2])))
allPairs.flatMap(pair -> Stream.of(new ParticalComputation(pair.d1, pair.d1.i + pair.d2.i), new PartialComputation(pair.d2, pair.d2.i+pair.d1.i)) // Map everything, parallely, to partial results keyable by the original data object
allPairs.collect(Collectors.groupByParallel(
    partialComp -> partialComp.d, // Regroup by the original data object
    Collectors.reducing(0, (sum1, sum2) -> sum1.sum + sum2.sum)) // reduce by summing
))

绕过2：信任实现

事实上，java中的无竞争锁已经变得更便宜了。最重要的是，纯锁定有时候有更好的选择，比如Java中的Atomic类型（例如AtomicLong，如果你要总结的东西），使用CAS而不是锁定，这可以更快（google for it，我通常是指Java Concurrency）实践中的硬数字书。）

事实是，如果你有1000到10k个不同的元素（转换成至少数百万对），比如8个CPU，那么争用（或者你的8个线程中至少有2个将处理相同的概率）元素）非常低。而且我宁愿直接测量它，而不是直接说出来＃34;我不能为锁定＆＃34;，特别是如果操作可以使用原子类型实现。

多线程 - 集合中所有对之间的计算

2 个答案:

围绕＃1：map / reduce

绕过2：信任实现