虚假分享仅在某些机器上变得明显

时间:2013-09-13 10:23:25

标签: java multithreading false-sharing

我在java中编写了以下测试类,以重现“虚假共享”引入的性能损失。

基本上你可以将数组的“大小”从4调整到更大的值(例如10000),以打开或关闭“虚假共享现象”。具体而言,当size = 4时,不同的线程更有可能更新同一缓存行中的值,从而导致更频繁的缓存未命中。从理论上讲,当size = 10000而不是size = 4时,测试程序应该运行得更快。

我多次在两台不同的机器上运行相同的测试:

机器A:联想X230笔记本电脑配备英特尔®酷睿™i5-3210M处理器(2核,4线程)Windows 7 64位

size = 4 => 5.5秒

size = 10000 => 5.4秒

机器B :戴尔OptiPlex 780配备英特尔®酷睿™2双核处理器E8400(2核)Windows XP 32位

size = 4 => 14.5秒

size = 10000 => 7.2秒

我稍后在其他一些机器上运行测试,很明显,False Sharing只会在某些机器上变得明显,我无法找出造成这种差异的决定性因素。

任何人都可以看看这个问题并解释为什么在这个测试类中引入的虚假共享只会在某些机器上变得明显?

public class FalseSharing {

interface Oper {
    int eval(int value);
}

//try tweak the size
static int size = 4;

//try tweak the op
static Oper op = new Oper() {
    @Override
    public int eval(int value) {
        return value + 2;
    }
};

static int[] array = new int[10000 + size];

static final int interval = (size / 4);

public static void main(String args[]) throws InterruptedException {

    long start = System.currentTimeMillis();
    Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + 5000);

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000] = op.eval(array[5000]);
                }
            }
        }
    });
    Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval] = op.eval(array[5000 + interval]);
                }
            }
        }
    });
    Thread t3 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 2));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 2] = op.eval(array[5000 + interval * 2]);
                }
            }
        }
    });
    Thread t4 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 3));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 3] = op.eval(array[5000 + interval * 3]);
                }
            }
        }
    });
    t1.start();
    t2.start();
    t3.start();
    t4.start();
    t1.join();
    t2.join();
    t3.join();
    t4.join();
    System.out.println("Finished!" + (System.currentTimeMillis() - start));
}

}

2 个答案:

答案 0 :(得分:0)

虚假共享仅发生在64字节的块上。您需要在所有四个线程中访问相同的64字节块。我建议您使用long[8]创建一个对象或数组,并在所有四个线程中更新此数组的不同单元格,并与访问独立数组的四个线程进行比较。

答案 1 :(得分:0)

您的代码可能没问题,这是一个更简单的版本结果:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;


public class TestFalseSharing {
    static long T0 = System.currentTimeMillis();

    static void p(Object msg) {
        System.out.format("%09.3f %-10s %s%n", new Double(0.001*(System.currentTimeMillis()-T0)), Thread.currentThread().getName(), " : "+msg);
    }

    public static void main(String args[]) throws InterruptedException {
        int NT = Runtime.getRuntime().availableProcessors();
        p("Available processors: "+NT);

        int MAXSPAN = 0x1000; //4kB
        final byte[] array = new byte[NT*MAXSPAN];

        for(int i=1; i<=MAXSPAN; i<<=1) {
            testFalseSharing(NT, i, array);
        }
    }

    static void testFalseSharing(final int NT, final int span, final byte[] array) throws InterruptedException {
        final int L1 = 10;
        final int L2 = 10_000_000;

        final CountDownLatch cl = new CountDownLatch(NT*L1);

        long t0 = System.nanoTime();

        for(int i=0 ; i<4; i++) {
            final int startOffset = i*span;

            Thread t = new Thread(new Runnable() {
                @Override
                public void run() {
                    //p("Offset:" + startOffset);
                    for (int j = 0; j < L1; j++) {
                        for (int k = 0; k < L2; k++) {
                            array[startOffset] += 1;
                        }
                        cl.countDown();
                    }
                }
            });
            t.start();

        }

        while(!cl.await(10, TimeUnit.SECONDS)) {
            p(""+cl.getCount()+" left");
        }

        long d = System.nanoTime() - t0;
        p("Duration: " + 1e-9*d + " seconds, Span="+span+" bytes");
    }
}

结果:

00000.000 main        : Available processors: 4
00002.843 main        : Duration: 2.837645384 seconds, Span=1 bytes
00005.689 main        : Duration: 2.8454065760000002 seconds, Span=2 bytes
00008.659 main        : Duration: 2.9697156340000004 seconds, Span=4 bytes
00011.640 main        : Duration: 2.979306959 seconds, Span=8 bytes
00013.780 main        : Duration: 2.140246744 seconds, Span=16 bytes
00015.387 main        : Duration: 1.6061148440000002 seconds, Span=32 bytes
00016.729 main        : Duration: 1.34128957 seconds, Span=64 bytes
00017.944 main        : Duration: 1.215005455 seconds, Span=128 bytes
00019.208 main        : Duration: 1.263007368 seconds, Span=256 bytes
00020.477 main        : Duration: 1.269272208 seconds, Span=512 bytes
00021.719 main        : Duration: 1.241061631 seconds, Span=1024 bytes
00022.975 main        : Duration: 1.256024242 seconds, Span=2048 bytes
00024.171 main        : Duration: 1.195086858 seconds, Span=4096 bytes

所以回答一下,它确认了64字节缓存线理论,至少在我的笔记本电脑核心i5上。