迭代Guava Cache中的值会丢失数据

时间:2018-01-26 10:39:30

标签: java guava jmh google-guava-cache

我开始基准测试在Guava缓存中按值查找密钥的方法,并且我注意到与并发级别相关的奇怪行为。我不确定这是一个错误或未定义的行为,甚至可能是预期但未指定。

我的基准测试应该在Guava Cache中找到键值,而不是通常要做的事情,我知道。

这是我完整的基准测试类:

@Fork(4)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 100, timeUnit = TimeUnit.MILLISECONDS)
public class ValueByKey {

    private Long counter = 0L;

    private final int MAX = 2500;

    private final LoadingCache<String, Long> stringToLong = CacheBuilder.newBuilder()
        .concurrencyLevel(1)
        .maximumSize(MAX + 5)
        .build(new CacheLoader<String, Long>() {
            public Long load(String mString) {
                return generateIdByString(mString);
            }
        });

    private final Map<String, Long> mHashMap = new Hashtable<>(MAX);
    private final Map<String, Long> concurrentHashMap = new ConcurrentHashMap<>(MAX);

    @Setup(Level.Trial)
    public void setup() {
        // Populate guava cache
        for(int i = 0; i <= MAX; i++) {
            try {
                stringToLong.get(UUID.randomUUID().toString());
            } catch (ExecutionException e) {
                e.printStackTrace();
                System.exit(1);
            }
        }
    }

    @Benchmark
    public String stringToIdByIteration() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : stringToLong.asMap().entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

    @Benchmark
    public String stringToIdByIterationHashTable() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : mHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

@Benchmark
    public String stringToIdByIterationConcurrentHashMap() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : concurrentHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("concurrentHashMap Returning null as value not found " + randomNum);
        return null;
    }

    private Long generateIdByString(final String mString) {
        mHashMap.put(mString, counter++);
        concurrentHashMap.put(mString, counter);
        return counter;
    }

}

我注意到当我将.concurrencyLevel(1)更改为不同于1的数字时,我开始丢失数据。以下输出来自并发级别4:

Iteration   1: Returning null as value not found 107
Returning null as value not found 43
Returning null as value not found 20
Returning null as value not found 77
Returning null as value not found 127
Returning null as value not found 35
Returning null as value not found 83
Returning null as value not found 43
Returning null as value not found 127
Returning null as value not found 107
Returning null as value not found 83
Returning null as value not found 82
Returning null as value not found 40
Returning null as value not found 58
Returning null as value not found 127
Returning null as value not found 114
Returning null as value not found 119
Returning null as value not found 43
Returning null as value not found 114
Returning null as value not found 18
Returning null as value not found 58
66.778 us/op

我注意到在使用HashMapHashTable使用相同的代码时,我从未丢失任何数据,它也表现得更好:

Benchmark Mode Cnt Score Error Units ValueByKey.stringToIdByIteration avgt 16 58.637 ± 15.094 us/op ValueByKey.stringToIdByIterationConcurrentHashMap avgt 16 16.148 ± 2.046 us/op ValueByKey.stringToIdByIterationHashTable avgt 16 11.705 ± 1.095 us/op

我的代码是错误的还是Guava无法正确处理并发级别高于1的分区HashTable?

  
      
  • 并发级别选项用于在内部对表进行分区,以便可以在不发生争用的情况下进行更新。
  •   
  • 理想的设置是一次可能访问缓存的最大线程数。
  •   

1 个答案:

答案 0 :(得分:2)

没有缓存保证缓存命中

缓存中数据的存在/不存在由逐出策略决定(首先将数据加载到缓存中)。

由于您已使用CacheBuilder.maximumSize(MAX + 5),因此您的缓存将使用基于尺寸的逐出,并且会在之前删除元素达到预设的最大尺寸。

当并发级别设置为 4 时,番石榴缓存会安全地播放它并将驱逐阈值设置得更低,这是有道理的,因为元素可以在被驱逐时继续到达。

这就是您的元素开始消失的原因。

要对此进行测试,请使您的类实现RemovalListener interface:

public class ValueByKey implements RemovalListener<String, Long> { 
    //...
    @Override
    public void onRemoval(RemovalNotification<String, Long> notification) {
        System.out.println("removed: " + notification.getKey() + " -> " + notification.getValue());
    }
    //...
}

...在运行测试时,您会注意到与缺失值匹配的驱逐:

# Warmup Iteration   1: 
removed: 110c0a73-1dc3-40ee-8909-969e6dee0ea0 -> 3
removed: 6417015a-f154-467f-b3bf-3b95831ac5b7 -> 6
removed: 5bc206f9-67ec-49a2-8471-b386ffc03988 -> 14
removed: 3c0a33e1-1fe1-4e42-b262-bf6a3e8c53f7 -> 21
Returning null as value not found 14
Returning null as value not found 14
Returning null as value not found 3
64.778 us/op
Iteration   1: 
Returning null as value not found 21
Returning null as value not found 21
Returning null as value not found 6
37.719 us/op
[...]

我可以想象驱逐的阈值计算可能很复杂,但在我的机器上,在运行基准时,将最大尺寸增加5%(CacheBuilder.maximumSize(Math.round(MAX * 1.05)))会阻止所有驱逐。