Ets阅读并发调整

时间:2015-05-12 10:17:28

标签: multithreading concurrency erlang

我正在玩ets调整,特别是read_concurrency。我编写了一个简单的测试来衡量这种调整如何影响读取性能。测试实施是herethere

简而言之,此测试会依次创建三个[public, set]个具有不同read_concurrency选项的表(没有任何调整,{read_concurrency, true}{read_concurrency, false})。创建一个表后,测试运行N个读者(N的幂为2,从4到1024)。然后读者执行10秒的随机读取并报告他们执行了多少次读取操作。

结果对我来说非常令人惊讶。这些测试之间完全没有区别。这是测试结果。

Non-tweaked table 4 workers: 26610428 read operations 8 workers: 26349134 read operations 16 workers: 26682405 read operations 32 workers: 26574700 read operations 64 workers: 26722352 read operations 128 workers: 26636100 read operations 256 workers: 26714087 read operations 512 workers: 27110860 read operations 1024 workers: 27545576 read operations Read concurrency true 4 workers: 30257820 read operations 8 workers: 29991281 read operations 16 workers: 30280695 read operations 32 workers: 30066830 read operations 64 workers: 30149273 read operations 128 workers: 28409907 read operations 256 workers: 28381452 read operations 512 workers: 29253088 read operations 1024 workers: 30955192 read operations Read concurrency false 4 workers: 30774412 read operations 8 workers: 29596126 read operations 16 workers: 24963845 read operations 32 workers: 29144684 read operations 64 workers: 29862287 read operations 128 workers: 25618461 read operations 256 workers: 27457268 read operations 512 workers: 28751960 read operations 1024 workers: 28790131 read operations

所以我想知道如何实现我的测试以查看任何差异并实现此优化的用例?

我已在以下安装中运行此测试:

  1. 2核,1个物理CPU,Erlang / OTP 17 [erts-6.1] [64位] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]( 示例测试输出来自此次运行
  2. 2核,1个物理CPU,Erlang / OTP 17 [erts-6.1] [64位] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:true] < / LI>
  3. 8核1物理CPU,Erlang / OTP 17 [erts-6.4] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false ]
  4. 8核1物理CPU,Erlang / OTP 17 [erts-6.4] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:true ]
  5. 64核4物理CPU,Erlang / OTP 17 [erts-6.3] [source] [64-bit] [smp:64:64] [async-threads:10] [hipe] [kernel-poll:false ]
  6. 64核4物理CPU,Erlang / OTP 17 [erts-6.3] [source] [64-bit] [smp:64:64] [async-threads:10] [hipe] [kernel-poll:true ]
  7. 所有相同的(当然,绝对测量值除外)。 所以有人可以告诉我为什么?我该怎么办才能看到有什么不同?

    UPD 根据弗雷德的回答,我已经更新my test以避免工作人员发生邮箱捶打。不幸的是,结果没有重大变化。

    UPD {@ 3}}根据@Pascal建议实现。现在所有工人都正确播种他们的随机发电机。同样的结果。

2 个答案:

答案 0 :(得分:1)

您的工作中可能首先要测试节点的调度能力 - 几乎一半的基准测试工作是轮询您的邮箱以了解您是否应该退出。这通常要求VM切换每个进程,将其放入队列,运行其他进程,检查其邮箱等。这样做很便宜,但阅读ETS也是如此。很可能你会产生很多噪音。

尝试的另一种方法是让所有工人在表格中阅读N百万次,并计算完成所需的时间。这将减少节点上完成的非ETS工作的次数,而只关注从表中读取。

我没有任何保证,但我敢打赌,具有更多并发性的表可以更快地运行。

答案 1 :(得分:1)

我创建了一个新版本的代码,在这个版本中我添加了一个布尔参数来执行或跳过ets访问。毫无疑问,大部分时间都花在其他东西上,而不是读到:

<强> [编辑]

在@Viacheslav评论之后,我现在初始化表格......几乎没有效果。

代码:

-module(perf).

-export ([tests/0]).

-define(TABLE_SIZE, 100).
-define(READS_COUNT, 5000000).

read_test(Doit,WkCount,NbRead,TableOpt) ->
    Table = ets:new(?MODULE, TableOpt),
    [ ets:insert(Table, {I, something}) || I <- lists:seq(1, ?TABLE_SIZE)],
    L = [erlang:now() || _ <- lists:seq(1,WkCount)],
    F = fun() -> spawn_readers(Doit,WkCount,NbRead,Table,L) end,
    {T,_} = timer:tc(F),
    ets:delete(Table),
    T.
table_types() -> 
    [[public, set, {read_concurrency, false}],[public, set, {read_concurrency, true}],[public, set]].

spawn_readers(Doit,WkCount, NbRead, Table, L_init) ->
    [spawn_monitor( fun() -> reader(Doit,NbRead, Table, X) end) || X <- L_init],
    reap_workers(WkCount).

reader(Doit,NbRead, Table, Seed) ->
    random:seed(Seed),
    reader_loop(Doit,NbRead,Table).

reader_loop(_,0,_Table) ->
    ok;
reader_loop(true,ToRead,Table) ->
    Key = random:uniform(?TABLE_SIZE),
    ets:lookup(Table, Key),
    reader_loop(true,ToRead-1, Table);
reader_loop(false,ToRead,Table) ->
    _Key = random:uniform(?TABLE_SIZE),
    reader_loop(false,ToRead-1, Table).

reap_workers(0) ->
    ok;
reap_workers(Count) ->
    receive
        {'DOWN', _, process, _, _} ->
            reap_workers(Count-1)
    end.

tests() ->
    [[{X,number_proc,Y,read_test(true,Y,?READS_COUNT div Y,X),read_test(false,Y,?READS_COUNT div Y,X)}
    || X <- table_types()] 
    || Y <- [1,10,100,1000,10000]].

和结果:

8> perf:tests().
[[{[public,set,{read_concurrency,false}],
   number_proc,1,2166000,1456000},
  {[public,set,{read_concurrency,true}],
   number_proc,1,2452000,1609000},
  {[public,set],number_proc,1,2513000,1538000}],
 [{[public,set,{read_concurrency,false}],
   number_proc,10,1153000,767000},
  {[public,set,{read_concurrency,true}],
   number_proc,10,1180000,768000},
  {[public,set],number_proc,10,1181000,784000}],
 [{[public,set,{read_concurrency,false}],
   number_proc,100,1149000,755000},
  {[public,set,{read_concurrency,true}],
   number_proc,100,1157000,747000},
  {[public,set],number_proc,100,1130000,749000}],
 [{[public,set,{read_concurrency,false}],
   number_proc,1000,1141000,756000},
  {[public,set,{read_concurrency,true}],
   number_proc,1000,1169000,748000},
  {[public,set],number_proc,1000,1146000,769000}],
 [{[public,set,{read_concurrency,false}],
   number_proc,10000,1224000,832000},
  {[public,set,{read_concurrency,true}],
   number_proc,10000,1274000,855000},
  {[public,set],number_proc,10000,1162000,826000}]]