BerkeleyDB JE随机访问时间非线性增加

时间:2012-06-19 09:24:12

标签: java performance nosql berkeley-db

我正在测试BerkeleyDB Java版,以了解我是否可以在我的项目中使用它。

我创建了一个非常简单的程序,它与com.sleepycat.je.Database类的对象一起工作:

  • 写入每个5-15kb的N条记录,其密钥生成类似于Integer.toString(random.nextInt());

  • 读取这些记录,使用方法Database#获取它们的顺序与它们的创建顺序相同;

  • 使用方法Database#get以随机顺序读取相同数量的记录。

我现在看到了奇怪的事情。第三次测试的执行时间随着记录数量的增加而非常非线性地增长。

  • N = 80000,写入= 55秒,顺序提取= 17秒,随机提取= 3秒
  • N = 100000,写入= 60秒,顺序提取= 20秒,随机提取= 7秒
  • N = 120000,写入= 68秒,顺序提取= 27秒,随机提取= 11秒
  • N = 140000,写入= 82秒,顺序提取= 32秒,随机提取= 47秒

(当然,我已多次进行测试。)

我想我做错了。这是引用的来源(对不起,它有点长),方法按相同的顺序调用:

private Environment env;
private Database db;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;


public boolean dbOpen() {
    EnvironmentConfig ec = new EnvironmentConfig();
    DatabaseConfig dc = new DatabaseConfig();
    ec.setAllowCreate(true);
    dc.setAllowCreate(true);
    env = new Environment(new File("mydbenv"), ec);
    db = env.openDatabase(null, "moe", dc);
    return true;
}

public int storeRecords(int i) {
    int j;
    long size = 0;
    DatabaseEntry key = new DatabaseEntry();
    DatabaseEntry val = new DatabaseEntry();

    random.setSeed(seed);

    for (j = 0; j < i; j++) {
        String k = Long.toString(random.nextLong());
        byte[] data = new byte[5000 + random.nextInt(10000)];
        keys.add(k);

        size += data.length;

        random.nextBytes(data);
        key.setData(k.getBytes());
        val.setData(data);
        db.put(null, key, val);
    }

    System.out.println("GENERATED SIZE: " + size);

    return j;
}                   

public int fetchRecords(int i) {
    int j, res;
    DatabaseEntry key = new DatabaseEntry();
    DatabaseEntry val = new DatabaseEntry();

    random.setSeed(seed);
    res = 0;

    for (j = 0; j < i; j++) {
        String k = Long.toString(random.nextLong());
        byte[] data = new byte[5000 + random.nextInt(10000)];
        random.nextBytes(data);
        key.setData(k.getBytes());
        db.get(null, key, val, null);
        if (Arrays.equals(data, val.getData())) {
            res++;
        } else {
            System.err.println("FETCH differs: " + j);
            System.err.println(data.length + " " + val.getData().length);
        }
    }

    return res;
}

public int fetchRandom(int i) {
    DatabaseEntry key = new DatabaseEntry();
    DatabaseEntry val = new DatabaseEntry();

    for (int j = 0; j < i; j++) {
        String k = keys.get(random.nextInt(keys.size()));
        key.setData(k.getBytes());
        db.get(null, key, val, null);
    }

    return i;
}

1 个答案:

答案 0 :(得分:1)

性能下降是非线性的,原因有两个:

  1. BDB-JE数据结构是一个b树,它具有O(log(n))性能,用于检索一条记录。通过get方法检索all是O(n * log(n))。
  2. 大型数据集不适合RAM,因此磁盘访问会减慢所有内容。随机访问具有非常差的缓存局部性。
  3. 请注意,您可以通过放弃一些持久性来提高写入性能:ec.setTxnWriteNoSync(true);

    您可能还想尝试Tupl,这是我一直在研究的开源BerkeleyDB替代品。它仍处于alpha阶段,但您可以在SourceForge上找到它。

    为了在BDB-JE和Tupl之间进行公平比较,我将缓存大小设置为500M,并在store方法的末尾执行显式检查点。

    使用BDB-JE:

    • N = 80000,写= 11.0秒,fetch = 5.3sec
    • N = 100000,写= 13.6秒,fetch = 7.0sec
    • N = 120000,写= 16.4秒,fetch = 29.5sec
    • N = 140000,写= 18.8秒,fetch = 35.9sec
    • N = 160000,写= 21.5秒,fetch = 41.3sec
    • N = 180000,写= 23.9秒,fetch = 46.4sec

    使用Tupl:

    • N = 80000,写= 21.7秒,fetch = 4.4sec
    • N = 100000,写= 27.6秒,fetch = 6.3sec
    • N = 120000,写= 30.2秒,fetch = 8.4sec
    • N = 140000,写= 35.4秒,fetch = 12.2sec
    • N = 160000,写= 39.9秒,fetch = 17.4sec
    • N = 180000,写= 45.4秒,fetch = 22.8sec
    由于其基于日志的格式,BDB-JE在编写条目方面更快。然而,Tupl的阅读速度更快。以下是Tupl测试的来源:

    import java.io. ; import java.util。;

    import org.cojen.tupl。*;

    public class TuplTest {     public static void main(final String [] args)throws Exception {         final RandTupl rt = new RandTupl();         rt.dbOpen(参数[0]);

        {
            long start = System.currentTimeMillis();
            rt.storeRecords(Integer.parseInt(args[1]));
            long end = System.currentTimeMillis();
            System.out.println("store duration: " + (end - start));
        }
    
        {
            long start = System.currentTimeMillis();
            rt.fetchRecords(Integer.parseInt(args[1]));
            long end = System.currentTimeMillis();
            System.out.println("fetch duration: " + (end - start));
        }
    }
    
    private Database db;
    private Index ix;
    private Random random = new Random();
    private List<String> keys = new ArrayList<String>();
    private int seed = 113;
    
    public boolean dbOpen(String home) throws Exception {
        DatabaseConfig config = new DatabaseConfig();
        config.baseFile(new File(home));
        config.durabilityMode(DurabilityMode.NO_FLUSH);
        config.minCacheSize(500000000);
        db = Database.open(config);
        ix = db.openIndex("moe");
        return true;
    }
    
    public int storeRecords(int i) throws Exception {
        int j;
        long size = 0;
    
        random.setSeed(seed);
    
        for (j = 0; j < i; j++) {
            String k = Long.toString(random.nextLong());
            byte[] data = new byte[5000 + random.nextInt(10000)];
            keys.add(k);
    
            size += data.length;
    
            random.nextBytes(data);
            ix.store(null, k.getBytes(), data);
        }
    
        System.out.println("GENERATED SIZE: " + size);
    
        db.checkpoint();
        return j;
    }
    
    public int fetchRecords(int i) throws Exception {
        int j, res;
    
        random.setSeed(seed);
        res = 0;
    
        for (j = 0; j < i; j++) {
            String k = Long.toString(random.nextLong());
            byte[] data = new byte[5000 + random.nextInt(10000)];
            random.nextBytes(data);
            byte[] val = ix.load(null, k.getBytes());
            if (Arrays.equals(data, val)) {
                res++;
            } else {
                System.err.println("FETCH differs: " + j);
                System.err.println(data.length + " " + val.length);
            }
        }
    
        return res;
    }
    
    public int fetchRandom(int i) throws Exception {
        for (int j = 0; j < i; j++) {
            String k = keys.get(random.nextInt(keys.size()));
            ix.load(null, k.getBytes());
        }
    
        return i;
    }
    

    }