Question

我需要创建一个包含所有城市和机场的java缓存。所以，如果我查询一个位置的缓存，让我们说一个城市，它应该返回该城市的所有机场，如果我查询一个机场的位置，我应该回到那个机场。此外，每个位置都必须作为字节数组存储在缓存中。（因为查询缓存的公开接口有byte []作为位置参数）其他考虑因素是：

检索必须非常快，尽可能快
缓存只在系统启动时加载一次。获取后不会更改加载。
由于它只加载一次，我们可以保持排序，如果这加快了检索速度。

到目前为止我得到了什么：

方法1

在byte []数组上创建一个瘦包装器，比方说ByteWrapper。将每个位置（机场和城市）作为地图中的键（TreeMap？）。使用ByteWrapper列表（包含适用的机场）作为值。

方法2

创建按位置排序的多维byte []数组。它本质上是一张地图。然后使用二进制搜索找到密钥并返回结果。

您会建议采用什么方法？如果您有更好的想法，请告诉我感谢

Answer 1

暴露的API基于byte []这一事实不应该必然决定缓存的内部细节。

第二个观察是，这不是一个广义的数据结构问题。所有机场的空间和所有城市的空间都是有限的，并且是众所周知的。（你甚至知道大小）。

散列图，树等都是保证在已建立边界内的某些性能特征的算法。

由于数据完整性不是问题（“数据不会改变”），如果空间考虑不重要（“尽快”），那么为什么不呢：

[编辑：这一点在切割和粘贴中以某种方式减少了损失：您为城市和机场编制索引（编号），因为您知道这些集合并且它们实际上是静态的。]

// these need to get initialized on startup
// this initialization can be optimized.

Map<byte[], Long> airportIndexes = new HashMap<byte[], Long>(NUMBER_OF_AIRPORTS);
Map<byte[], Long> citiesIndexes = new HashMap<byte[], Long>(NUMBER_OF_CITIES);

Map<Long, byte[]> airports = new HashMap<Long, byte[]>(NUMBER_OF_AIRPORTS);
Map<Long, byte[]> cities = new HashMap<Long, byte[]>(NUMBER_OF_CITIES);

long[][] airportToCitiesMappings = new byte[NUMBER_OF_AIRPORTS][];
long[][] citiesToAirportMappings = new byte[NUMBER_OF_CITIES][];


public List<byte[]> getCitiesNearAirport(byte[] airportName) {
   Long[] cityIndexes = getCitiesByIdxNearAirport(airportName);
   List<byte[]> cities = new ArrayList<byte[]>(cityIndexes.length);
   for(Long cityIdx : cityIndexes) {
       cities.add(cities.get(cityIdx));
   }
   return cities;
}
public long[] getCitiesByIdxNearAirport(Long airportIdx) {
   return airportToCitiesMappings[airportIdx];
}
public long[] getCitiesNearAirport(byte[] airportName) {
   return getCitiesNearAirport(airportIndexes.get(airportName));
}
public long[] getCitiesNearAirport(Long airportIdx) {
   return airportToCitiesMappings[airportIdx];
}
// .. repeat above pattern for airports.

那应该给你O（1）时间表现特征。在空间方面存在相当大的冗余。

Answer 2

你不需要字节数组，字符串就可以了。

您多久会向此缓存添加项目？我猜它完全是静态的，因为它们不是每天都在建造新的城市或机场。

所以，你可以做的是使用两个MultiHashMaps，一个键入城市，另一个键入机场。结帐Google Multimap http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html

如果你有机会使用mySQL，你实际上可以使用基于内存存储引擎的表。

许多数据库可以将表固定在内存中，绝对是Oracle可以，所以这是另一种方法。

Answer 3

尝试接近1，因为byte []是一个Object类型，你可以使用类似的东西：

Map<byte[], List<byte[]>> cache = ...

这可能是最简单的方法，您只需选择Map的实现。可能你应该使用HashMap，因为它是最简单的......

正如gustavc所说使用HashMap不起作用，所以你可以改为使用带有给定比较器的TreeMap：

Map<byte[], List<byte[]>> m = new TreeMap<byte[], List<byte[]>>(new Comparator<byte[]>() {
    public int compare(byte[] o1, byte[] o2) {
        int result = (o1.length < o2.length ? -1 : (o1.length == o2.length ? 0 : 1));
        int index = 0;
        while (result == 0 && index < o1.length) {
            result = (o1[index] < o2[index] ? -1 : (o1[index] == o2[index] ? 0 : 1));
            index++;
        }
        return result;
    }
});

Answer 4

这就是我到目前为止所做的事情：

private static byte[][][] cache = null; // this is the actual cache
// this map has ByteArrayWrapper(a wrapper over byte[]) as key which
//  can be an airport or city and index of corresponding 
// airport/airports in byte[][][]cache as value
Map<ByteArrayWrapper, Integer> byteLocationIndexes = null;
/**
* This is how cache is queried. You can pass an airport or city as a location parameter
* It will fetch the corresponding airport/airports
*/
private byte[][] getAllAirportsForLocation(ByteArrayWrapper location) {
    byte[][] airports = null;
    airports = byteLocationIndexes.get(location)== null ? null : cache[byteLocationIndexes.get(location).intValue()];
    return airports;
}

我使用String作为indexMap中的键（并使用String [] [] cache）和ByteArrayWrapper作为键（和byte []作为缓存）来标记性能。如果我使用ByteArrayWrapper和byte [] [] []缓存，则会有15-20％的改进。

还有什么方法可以改善表现？如果我使用Map的其他一些实现会有帮助吗？由于缓存只加载一次而且从不更改，因此可以对其进行排序。大多数时间是在byteLocationIndexes中进行关键查找，这是瓶颈。我已经在创建对象时计算hashCode并将其作为本地变量存储在ByteArrayWrapper中。

有什么建议吗？

如何使用二进制数组作为键和二进制数组实现缓存作为Java中的值

4 个答案: