我写了几个Java类 - SingleThreadedCompute
和MultithreadedCompute
- 来证明事实(或者我一直认为是事实!)如果你并行化以计算为中心(没有我/ O)单核机器上的任务,你没有得到加速。事实上,我的理解是并行化这些任务实际上会减慢速度,因为现在你必须处理上下文切换开销。好吧,我运行了类,并行版本出乎意料地运行得更快:单线程版本在我的机器上一直运行超过7秒,并且多线程版本在我的机器上一直运行超过6秒。任何人都能解释一下这是怎么回事吗?
如果有人想要为自己寻找或尝试,那么以下是课程。
public final class SingleThreadedCompute {
private static final long _1B = 1000000000L; // one billion
public static void main(String[] args) {
long startMs = System.currentTimeMillis();
long total = 0;
for (long i = 0; i < _1B; i++) { total += i; }
System.out.println("total=" + total);
long elapsedMs = System.currentTimeMillis() - startMs;
System.out.println("Elapsed time: " + elapsedMs + " ms");
}
}
这是多线程版本:
public final class MultithreadedCompute {
private static final long _1B = 1000000000L; // one billion
private static final long _100M = _1B / 10L;
public static void main(String[] args) {
long startMs = System.currentTimeMillis();
System.out.println("Creating workers");
Worker[] workers = new Worker[10];
for (int i = 0; i < 10; i++) {
workers[i] = new Worker(i * _100M, (i+1) * _100M);
}
System.out.println("Starting workers");
for (int i = 0; i < 10; i++) { workers[i].start(); }
for (int i = 0; i < 10; i++) {
try {
workers[i].join();
System.out.println("Joined with thread " + i);
} catch (InterruptedException e) { /* can't happen */ }
}
System.out.println("Summing worker totals");
long total = 0;
for (int i = 0; i < 10; i++) { total += workers[i].getTotal(); }
System.out.println("total=" + total);
long elapsedMs = System.currentTimeMillis() - startMs;
System.out.println("Elapsed time: " + elapsedMs + " ms");
}
private static class Worker extends Thread {
private long start, end;
private long total;
public Worker(long start, long end) {
this.start = start;
this.end = end;
}
public void run() {
System.out.println("Computing sum " + start + " + ... + (" + end + " - 1)");
for (long i = start; i < end; i++) { total += i; }
}
public long getTotal() { return total; }
}
}
以下是运行单线程版本的输出:
total=499999999500000000
Elapsed time: 7031 ms
这是运行多线程版本的输出:
Creating workers
Starting workers
Computing sum 0 + ... + (100000000 - 1)
Computing sum 100000000 + ... + (200000000 - 1)
Computing sum 200000000 + ... + (300000000 - 1)
Computing sum 300000000 + ... + (400000000 - 1)
Computing sum 400000000 + ... + (500000000 - 1)
Computing sum 500000000 + ... + (600000000 - 1)
Computing sum 600000000 + ... + (700000000 - 1)
Computing sum 700000000 + ... + (800000000 - 1)
Computing sum 800000000 + ... + (900000000 - 1)
Computing sum 900000000 + ... + (1000000000 - 1)
Joined with thread 0
Joined with thread 1
Joined with thread 2
Joined with thread 3
Joined with thread 4
Joined with thread 5
Joined with thread 6
Joined with thread 7
Joined with thread 8
Joined with thread 9
Summing worker totals
total=499999999500000000
Elapsed time: 6172 ms
编辑:有关环境的信息:
不确定如何证明它是单核心机器,而不是通过陈述上述规范并在我购买机器时注意到这一点(2005年8月),单核是标准的,我没有升级到多核(如果这甚至是一种选择......我不记得了)。如果在Windows的某个地方我可以检查除系统属性(显示上面的信息)以外的其他信息,请告诉我,我会检查。
以下是连续五次ST和MT的运行:
五次单人游戏:
总= 499999999500000000 经过时间:7000毫秒
总= 499999999500000000 经过时间:7031毫秒
总= 499999999500000000 经过时间:6922毫秒
总= 499999999500000000 经过时间:6968毫秒
总= 499999999500000000 经过时间:6938毫秒
五个多线程运行:
总= 499999999500000000 经过时间:6047毫秒
总= 499999999500000000 经过时间:6141毫秒
总= 499999999500000000 经过时间:6063毫秒
总= 499999999500000000 经过时间:6282毫秒
总= 499999999500000000 经过时间:6125毫秒
答案 0 :(得分:6)
这可能是由于超线程和/或流水线操作造成的。
来自维基百科on hyper-threading:
超线程是超线程的进步。超线程(官方称为超线程技术或HTT)是一种英特尔专有技术,用于改进在PC微处理器上执行的计算并行化(一次执行多个任务)。启用了超线程的处理器被操作系统视为两个处理器而不是一个处理器。这意味着只有一个处理器在物理上存在,但操作系统看到两个虚拟处理器,并在它们之间共享工作负载。
来自维基百科on piplining:
在计算中,管道是一组串联连接的数据处理元素,因此一个元素的输出是下一个元素的输入。管道的元素通常以并行或时间切片方式执行
答案 1 :(得分:3)
您的其他环境是什么样的?这是可重复的吗?
至少在UNIX机器上,像这样的长时间运行的单个进程可能会优先降低;如果你有10个线程,每个线程都有自己的CPU片段,因此不会累积尽可能多的CPU时间。然后,它将不会失去优先权。总的来说,它获得了更大的CPU总量。
为了完整起见,这是您的代码在OS / X 10.5.6下的双核mac mini上提供的内容
527 $ java MultithreadedCompute
Creating workers
Starting workers
Computing sum 100000000 + ... + (200000000 - 1)
Computing sum 0 + ... + (100000000 - 1)
Computing sum 400000000 + ... + (500000000 - 1)
Computing sum 200000000 + ... + (300000000 - 1)
Computing sum 500000000 + ... + (600000000 - 1)
Computing sum 600000000 + ... + (700000000 - 1)
Computing sum 700000000 + ... + (800000000 - 1)
Computing sum 800000000 + ... + (900000000 - 1)
Computing sum 900000000 + ... + (1000000000 - 1)
Computing sum 300000000 + ... + (400000000 - 1)
Joined with thread 0
Joined with thread 1
Joined with thread 2
Joined with thread 3
Joined with thread 4
Joined with thread 5
Joined with thread 6
Joined with thread 7
Joined with thread 8
Joined with thread 9
Summing worker totals
total=499999999500000000
Elapsed time: 3217 ms
528 $ java SingleThreadedCompute
total=499999999500000000
Elapsed time: 5651 ms
529 $
正如您所看到的,线程不一定按顺序运行,并且多线程的运行时间约为单线程的56%,表明它正在利用线程。
答案 2 :(得分:3)
我尝试关闭JIT,就像Pax在上面的评论中所建议的那样。 Pax,如果你想快速发布“关闭JIT”答案,我会相信你的解决方案。
无论如何关闭JIT工作(意味着它使实际结果与预期结果一致)。我不得不退出十亿,因为它是永远的,所以我去了1亿。结果更符合我的预期。他们在这里:
五条单线穿线
总= 49999999.5亿 经过时间:17094毫秒
总= 49999999.5亿 经历时间:17109毫秒
总= 49999999.5亿 经过时间:17219毫秒
总= 49999999.5亿 经过时间:17375毫秒
总= 49999999.5亿 经历时间:17125毫秒
五次没有多线程的跑步
总= 49999999.5亿 经历时间:18719毫秒
总= 49999999.5亿 经过时间:18750毫秒
总= 49999999.5亿 经历时间:18610毫秒
总= 49999999.5亿 经过时间:18890毫秒
总= 49999999.5亿 经历时间:18719毫秒
感谢大家的想法和帮助。
答案 3 :(得分:1)
十分之一秒差异?启动时(单独)产生的噪音会使其淹没。写一些运行一两分钟的东西。
答案 4 :(得分:0)
尝试消除由单线程和多线程变体执行的代码之间由HotSpot引起的差异:
public class ThreadedWorkers {
private static final long _1B = 1000000000L; // one billion
private static final long _100M = _1B / 10L;
enum ThreadMode { SINGLE, SEQUENTIAL, MULTI };
public static void main(String[] args) throws InterruptedException {
final long startMs = System.currentTimeMillis();
ThreadMode mode = args.length == 0 ? ThreadMode.SINGLE : ThreadMode.valueOf(args[0].toUpperCase());
final long total = computeTotal( mode );
System.out.println("total=" + total);
long elapsedMs = System.currentTimeMillis() - startMs;
System.out.println("Elapsed time: " + elapsedMs + " ms");
}
public static long computeTotal (ThreadMode mode) throws InterruptedException {
Worker[] workers = new Worker[10];
for (int i = 0; i < 10; i++)
workers[i] = new Worker(i * _100M, (i+1) * _100M);
switch (mode) {
case SINGLE: {
for (Worker worker : workers )
worker.run();
break;
}
case SEQUENTIAL:{
for (Worker worker : workers ) {
worker.start();
worker.join();
}
break;
}
case MULTI: {
for (Worker worker : workers )
worker.start();
for (Worker worker : workers )
worker.join();
break;
}
}
System.out.println("Summing worker totals");
long total = 0;
for (Worker worker : workers )
total += worker.getTotal();
return total;
}
static class Worker extends Thread {
private long start, end, total;
public Worker(long start, long end) {
this.start = start;
this.end = end;
}
public void run() {
System.out.println("Computing sum " + start + " + ... + (" + end + " - 1)");
for (long i = start; i < end; i++) { total += i; }
}
public long getTotal() { return total; }
}
}
这仍然比多个顺序或单个运行得更快(在eee pc 900 - 23对13秒上大约10秒),即使顺序执行相同的方法多次相同的次数。
答案 5 :(得分:0)
仅仅因为它很有趣......来自8核服务器类机器的结果。 AMD 2.7GHz上海cpus
Creating workers
Starting workers
Computing sum 0 + ... + (100000000 - 1)
Computing sum 100000000 + ... + (200000000 - 1)
Computing sum 300000000 + ... + (400000000 - 1)
Computing sum 500000000 + ... + (600000000 - 1)
Computing sum 600000000 + ... + (700000000 - 1)
Computing sum 200000000 + ... + (300000000 - 1)
Computing sum 800000000 + ... + (900000000 - 1)
Computing sum 700000000 + ... + (800000000 - 1)
Computing sum 900000000 + ... + (1000000000 - 1)
Computing sum 400000000 + ... + (500000000 - 1)
Joined with thread 0
Joined with thread 1
Joined with thread 2
Joined with thread 3
Joined with thread 4
Joined with thread 5
Joined with thread 6
Joined with thread 7
Joined with thread 8
Joined with thread 9
Summing worker totals
total=499999999500000000
Elapsed time: 444 ms