我们说我有这两个矩阵:
double[][] a = new double[2][2]
a[0][0] = 1
a[0][1] = 2
a[1][0] = 3
a[1][1] = 4
double[][] b = new double[2][2]
b[0][0] = 1
b[0][1] = 2
b[1][0] = 3
b[1][1] = 4
以传统方式,总结这个矩阵我会做一个嵌套for循环:
int rows = a.length;
int cols = a[0].length;
double[][] res = new double[rows][cols];
for(int i = 0; i < rows; i++){
for(int j = 0; j < cols; j++){
res[i][j] = a[i][j] + b[i][j];
}
}
我对流API很新,但我认为这非常适合与parallelStream
一起使用,所以我的问题是,是否有办法做到这一点,并利用并行处理?
编辑:不确定这是不是正确的地方,但我们走了: 使用一些建议我把Stream推给了测试。设置如下: 经典方法:
public class ClassicMatrix {
private final double[][] components;
private final int cols;
private final int rows;
public ClassicMatrix(final double[][] components){
this.components = components;
this.rows = components.length;
this.cols = components[0].length;
}
public ClassicMatrix addComponents(final ClassicMatrix a) {
final double[][] res = new double[rows][cols];
for (int i = 0; i < rows; i++) {
for (int j = 0; j < rows; j++) {
res[i][j] = components[i][j] + a.components[i][j];
}
}
return new ClassicMatrix(res);
}
}
使用@dkatzel建议:
public class MatrixStream1 {
private final double[][] components;
private final int cols;
private final int rows;
public MatrixStream1(final double[][] components){
this.components = components;
this.rows = components.length;
this.cols = components[0].length;
}
public MatrixStream1 addComponents(final MatrixStream1 a) {
final double[][] res = new double[rows][cols];
IntStream.range(0, rows*cols).parallel().forEach(i -> {
int x = i/rows;
int y = i%rows;
res[x][y] = components[x][y] + a.components[x][y];
});
return new MatrixStream1(res);
}
}
使用@Eugene建议:
public class MatrixStream2 {
private final double[][] components;
private final int cols;
private final int rows;
public MatrixStream2(final double[][] components) {
this.components = components;
this.rows = components.length;
this.cols = components[0].length;
}
public MatrixStream2 addComponents(final MatrixStream2 a) {
final double[][] res = new double[rows][cols];
IntStream.range(0, rows)
.forEach(i -> Arrays.parallelSetAll(res[i], j -> components[i][j] * a.components[i][j]));
return new MatrixStream2(res);
}
}
和一个测试类,每个方法运行3次独立时间(只需替换main()中的方法名称):
public class MatrixTest {
private final static String path = "/media/manuel/workspace/data/";
public static void main(String[] args) {
final List<Double[]> lst = new ArrayList<>();
for (int i = 100; i < 8000; i = i + 400) {
final Double[] d = testClassic(i);
System.out.println(d[0] + " : " + d[1]);
lst.add(d);
}
IOUtils.saveToFile(path + "classic.csv", lst);
}
public static Double[] testClassic(final int i) {
final ClassicMatrix a = new ClassicMatrix(rand(i));
final ClassicMatrix b = new ClassicMatrix(rand(i));
final long start = System.currentTimeMillis();
final ClassicMatrix mul = a.addComponents(b);
final long now = System.currentTimeMillis();
final double elapsed = (now - start);
return new Double[] { (double) i, elapsed };
}
public static Double[] testStream1(final int i) {
final MatrixStream1 a = new MatrixStream1(rand(i));
final MatrixStream1 b = new MatrixStream1(rand(i));
final long start = System.currentTimeMillis();
final MatrixStream1 mul = a.addComponents(b);
final long now = System.currentTimeMillis();
final double elapsed = (now - start);
return new Double[] { (double) i, elapsed };
}
public static Double[] testStream2(final int i) {
final MatrixStream2 a = new MatrixStream2(rand(i));
final MatrixStream2 b = new MatrixStream2(rand(i));
final long start = System.currentTimeMillis();
final MatrixStream2 mul = a.addComponents(b);
final long now = System.currentTimeMillis();
final double elapsed = (now - start);
return new Double[] { (double) i, elapsed };
}
private static double[][] rand(final int size) {
final double[][] rnd = new double[size][size];
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
rnd[i][j] = Math.random();
}
}
return rnd;
}
}
结果:
Classic Matrix size, Time (ms)
100.0,1.0
500.0,5.0
900.0,5.0
1300.0,43.0
1700.0,94.0
2100.0,26.0
2500.0,33.0
2900.0,46.0
3300.0,265.0
3700.0,71.0
4100.0,87.0
4500.0,380.0
4900.0,432.0
5300.0,215.0
5700.0,238.0
6100.0,577.0
6500.0,677.0
6900.0,609.0
7300.0,584.0
7700.0,592.0
Stream1, Time(ms)
100.0,86.0
500.0,13.0
900.0,9.0
1300.0,47.0
1700.0,92.0
2100.0,29.0
2500.0,33.0
2900.0,46.0
3300.0,253.0
3700.0,71.0
4100.0,90.0
4500.0,352.0
4900.0,373.0
5300.0,497.0
5700.0,485.0
6100.0,579.0
6500.0,711.0
6900.0,800.0
7300.0,780.0
7700.0,902.0
Stream2, Time(ms)
100.0,111.0
500.0,42.0
900.0,12.0
1300.0,54.0
1700.0,97.0
2100.0,110.0
2500.0,177.0
2900.0,71.0
3300.0,250.0
3700.0,106.0
4100.0,359.0
4500.0,143.0
4900.0,233.0
5300.0,261.0
5700.0,289.0
6100.0,406.0
6500.0,814.0
6900.0,830.0
7300.0,828.0
7700.0,911.0
根本没有任何进步。这个缺陷在哪里?矩阵是否小(7700 x 7700)?大于此,它会炸毁我的计算机内存。
答案 0 :(得分:12)
一种方法是使用Arrays.parallelSetAll
:
int rows = a.length;
int cols = a[0].length;
double[][] res = new double[rows][cols];
Arrays.parallelSetAll(res, i -> {
Arrays.parallelSetAll(res[i], j -> a[i][j] + b[i][j]);
return res[i];
});
我不是百分百肯定,但我认为对Arrays.parallelSetAll
的内部调用可能不值得为每行的列生成内部并行化的开销。也许它仅仅足以并行化每行的总和:
Arrays.parallelSetAll(res, i -> {
Arrays.setAll(res[i], j -> a[i][j] + b[i][j]);
return res[i];
});
无论如何,在向算法添加并行化之前,您应该仔细测量,因为很多时候开销太大而不值得使用它。
答案 1 :(得分:10)
这还有待衡量(稍后我会),但是Arrays.parallelSetAll
中已经构建的内容不应该以最快的方式完成工作吗?
for (int i = 0; i < a.length; ++i) {
int j = i;
Arrays.parallelSetAll(r[j], x -> a[j][x] + b[j][x]);
}
甚至更好:
IntStream.range(0, a.length)
.forEach(i -> Arrays.parallelSetAll(r[i], j -> a[i][j] + b[i][j]));
这对CPU缓存也很有用,因为下一个条目在同一缓存行中的概率很大。以相反的顺序(列和行)执行读取将分散读取所有地方。
我已经进行了jmh测试here。请注意Federico's answer是最快的。上去投票他的想法。
结果如下:
Benchmark (howManyEntries) Mode Cnt Score Error Units
DoubleArraySum.dkatzel 100 avgt 10 0.055 ± 0.005 ms/op
DoubleArraySum.dkatzel 500 avgt 10 0.997 ± 0.156 ms/op
DoubleArraySum.dkatzel 1000 avgt 10 4.162 ± 0.368 ms/op
DoubleArraySum.dkatzel 3000 avgt 10 39.619 ± 4.391 ms/op
DoubleArraySum.dkatzel 8000 avgt 10 236.468 ± 41.599 ms/op
DoubleArraySum.eugene 100 avgt 10 0.671 ± 0.187 ms/op
DoubleArraySum.eugene 500 avgt 10 6.317 ± 0.268 ms/op
DoubleArraySum.eugene 1000 avgt 10 14.751 ± 0.676 ms/op
DoubleArraySum.eugene 3000 avgt 10 65.174 ± 6.044 ms/op
DoubleArraySum.eugene 8000 avgt 10 285.571 ± 23.206 ms/op
DoubleArraySum.federico1 100 avgt 10 0.169 ± 0.010 ms/op
DoubleArraySum.federico1 500 avgt 10 1.999 ± 0.217 ms/op
DoubleArraySum.federico1 1000 avgt 10 6.087 ± 1.108 ms/op
DoubleArraySum.federico1 3000 avgt 10 40.825 ± 4.853 ms/op
DoubleArraySum.federico1 8000 avgt 10 267.446 ± 37.490 ms/op
DoubleArraySum.federico2 100 avgt 10 0.034 ± 0.003 ms/op
DoubleArraySum.federico2 500 avgt 10 0.974 ± 0.152 ms/op
DoubleArraySum.federico2 1000 avgt 10 3.245 ± 0.080 ms/op
DoubleArraySum.federico2 3000 avgt 10 30.503 ± 5.960 ms/op
DoubleArraySum.federico2 8000 avgt 10 183.183 ± 21.861 ms/op
DoubleArraySum.holijava 100 avgt 10 0.063 ± 0.002 ms/op
DoubleArraySum.holijava 500 avgt 10 1.112 ± 0.020 ms/op
DoubleArraySum.holijava 1000 avgt 10 4.138 ± 0.062 ms/op
DoubleArraySum.holijava 3000 avgt 10 41.784 ± 1.029 ms/op
DoubleArraySum.holijava 8000 avgt 10 266.590 ± 4.080 ms/op
DoubleArraySum.pivovarit 100 avgt 10 0.112 ± 0.002 ms/op
DoubleArraySum.pivovarit 500 avgt 10 2.427 ± 0.075 ms/op
DoubleArraySum.pivovarit 1000 avgt 10 9.572 ± 0.355 ms/op
DoubleArraySum.pivovarit 3000 avgt 10 84.413 ± 2.197 ms/op
DoubleArraySum.pivovarit 8000 avgt 10 690.942 ± 34.993 ms/op
修改强>
这是一个更易读的输出(federico赢得所有输入)
100=[federico2, dkatzel, holijava, pivovarit, federico1, eugene]
500=[federico2, dkatzel, holijava, federico1, pivovarit, eugene]
1000=[federico2, holijava, dkatzel, federico1, pivovarit, eugene]
3000=[federico2, dkatzel, federico1, holijava, eugene, pivovarit]
8000=[federico2, dkatzel, holijava, federico1, eugene, pivovarit]
答案 2 :(得分:6)
我在这里看到的唯一选择是更多/更少生成所有可能的索引对,然后提取元素并应用求和。使用并行流不会对这个小例子产生任何额外的积极影响,但你可以在这里使用Stream API(如果需要立即转换为并行),虽然结果并不像预期的那样好:
last_news {
padding: 35px
}
ul {
padding-left: 0px;
margin: 0;
overflow: hidden;
}
ul li {
list-style-type: none;
cursor: pointer;
float: left;
width: 33%;
height: 250px;
background-color: red;
margin-right: 0.5%;
margin-bottom: 5px;
color: #FFF;
position: relative;
}
li:nth-of-type(3) {
margin-right: 0;
}
li:nth-of-type(4n+7) {
margin-right: 0;
}
li.actu_details {
width: 100%;
height: 0px;
background-color: green;
display: block;
}
li.actu_details.expend {
height: 350px;
}
我们需要引入一个中间人(中间对?),以便我们可以并行化一个<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<section class="last_news">
<div class="contenu_grid">
<ul>
<li class="une_actu">Test</li>
<li class="une_actu">Test</li>
<li class="une_actu">Test</li>
<li class="actu_details">
Detail
</li>
<li class="une_actu">Test</li>
<li class="une_actu">Test</li>
<li class="une_actu">Test</li>
<li class="actu_details">
Detail
</li>
<li class="une_actu">Test</li>
<li class="une_actu">Test</li>
<li class="une_actu">Test</li>
<li class="actu_details">
Detail
</li>
</ul>
</div>
</section>
而不使用并行化嵌套IntStream.range(0, a.length).boxed()
.flatMap(i -> IntStream.range(0, a[0].length)
.mapToObj(j -> new AbstractMap.SimpleImmutableEntry<>(i, j)))
.parallel()
.forEach(e -> {
res[e.getKey()][e.getValue()]
= a[e.getKey()][e.getValue()] + b[e.getKey()][e.getValue()];
});
。
另一种高级方法是实现自己的自定义收集器,但在某些时候仍然会涉及嵌套循环。
尝试对两个数组中的所有值求和时,可以观察到Stream API的真正功能:
Stream
答案 3 :(得分:3)
您可以使用IntStream
在矩阵中的单元格数量上制作流,然后进行一些数学运算将该int转换为矩阵位置。
IntStream.range(0, rows*cols)
.parallel()
.forEach( i->{
int x = i/rows;
int y = i%rows;
res[x][y] = a[x][y] + b[x][y];
});
这个问题的其他答案不仅错误(在撰写本文时),而是创建了多个影响性能的Streams,甚至也没有并行
正如@Holger所指出的那样,虽然这个单一流可能更容易阅读,但是分区和模数的性能成本会比流量流更慢,只有很多内核才会增加。我不确定需要补偿多少
答案 4 :(得分:3)
这个怎么样?
double[][] res = IntStream.range(0, a.length).parallel()
.mapToObj(i ->
IntStream.range(0, a[i].length)
.mapToDouble(j -> a[i][j] + b[i][j])
.toArray()
)
.toArray(double[][]::new);
System.out.println(res);
// ^--- [[2., 4.], [6., 8.]]