Erlang mnesia数据库访问

时间:2011-12-01 13:31:53

标签: database erlang scalability mnesia otp

我设计了一个包含5个不同表格的mnesia数据库。我的想法是模拟来自许多节点(计算机)的查询而不仅仅是一个,从终端我可以执行查询,但我只需要帮助我如何能够使我从多台计算机请求信息。我正在测试可伸缩性,并希望调查mnesia与其他数据库的性能。任何想法都将受到高度赞赏。

2 个答案:

答案 0 :(得分:8)

测试mnesia的最佳方法是在运行mnesia的本地Erlang节点和远程节点上使用密集线程作业。通常,您希望使用RPC calls来创建远程节点,其中在mnesia表上执行读写操作。当然,高并发性需要权衡;交易速度会降低,许多可能会被重试,因为在给定时间锁可能很多;但是mnesia将确保所有流程在每次交易中都收到{atomic,ok}

概念
我建议我们有一个非阻塞重载,其中写入和读取都通过尽可能多的进程定向到每个mnesia表。我们测量对write函数的调用与我们的大量mnesia订阅者获取Write事件所花费的时间之间的时间差。这些事件是在成功交易后由mnesia发送的,因此我们不需要中断工作/重载过程,而是让一个强大的" mnesia订阅者等待异步事件报告成功删除和写入。
这里的技术是我们在调用写入函数之前的时间点上取时间戳,然后记下record keywrite CALL timestamp。然后我们的mnesia订阅者会记下record keywrite/read EVENT timestamp。然后,这两个时间戳之间的时差(让我们称之为: CALL-to-EVENT Time )可以让我们大致了解我们的加载方式或效率。随着锁定随着并发而增加,我们应该注册增加的 CALL-to-EVENT Time 参数。执行写入(无限制)的进程将同时执行,而执行读取的进程也将继续执行此操作而不会中断。我们将为每个操作选择进程数,但首先要为整个测试用例奠定基础 以上所有概念都适用于本地操作(与Mnesia在同一节点上运行的进程)

- >模拟许多节点
好吧,我个人没有在Erlang中模拟节点,我一直在同一个盒子上或在网络环境中的几台不同的机器上使用真正的Erlang节点。不过,我建议您仔细查看此模块: http://www.erlang.org/doc/man/slave.html,请在此处更多地关注此模块: http://www.erlang.org/doc/man/ct_slave.html,并在他们谈话时查看以下链接关于在另一个父节点( http://www.erlang.org/doc/man/pool.html Erlang: starting slave node https://support.process-one.net/doc/display/ERL/Starting+a+set+of+Erlang+cluster+nodes,{{3}下创建,模拟和控制多个节点} )。我不会潜入这里的Erlang Nodes丛林,因为这也是另一个复杂的话题,但我将专注于运行mnesia的同一节点上的测试。我已经提出了上面的mnesia测试概念,在这里,让我们开始实现它。

现在,首先,您需要为每个表制作一个测试计划(单独)。这应包括写入和读取。然后,您需要决定是否要对表执行脏操作或事务操作。您需要测试与其大小相关的遍历mnesia表的速度。让我们举一个简单的mnesia表的例子

-record(key_value,{key,value,instanceId,pid}).

我们希望有一个通用函数来写入我们的表格,如下所示:

write(Record)->
    %% Use mnesia:activity/4 to test several activity
    %% contexts (and if your table is fragmented)
    %% like the commented code below
    %%
    %%  mnesia:activity(
    %%      transaction, %% sync_transaction | async_dirty | ets | sync_dirty
    %%      fun(Y) -> mnesia:write(Y) end,
    %%      [Record],
    %%      mnesia_frag
    %%  )
    mnesia:transaction(fun() -> ok = mnesia:write(Record) end).

对于我们的阅读,我们将:

read(Key)->
    %% Use mnesia:activity/4 to test several activity
    %% contexts (and if your table is fragmented)
    %% like the commented code below
    %%
    %%  mnesia:activity(
    %%      transaction, %% sync_transaction | async_dirty| ets | sync_dirty
    %%      fun(Y) -> mnesia:read({key_value,Y}) end,
    %%      [Key],
    %%      mnesia_frag
    %%  )
    mnesia:transaction(fun() -> mnesia:read({key_value,Key}) end).
现在,我们想在我们的小表中写入很多记录。我们需要一个密钥生成器。这个密钥生成器将是我们自己的伪随机字符串生成器。但是,我们需要我们的生成器告诉我们它生成密钥的瞬间,以便我们记录它。我们想看看编写生成的密钥需要多长时间。让我们这样说:
timestamp()-> erlang:now().
str(XX)-> integer_to_list(XX).
generate_instance_id()-> random:seed(now()), guid() ++ str(crypto:rand_uniform(1, 65536 * 65536)) ++ str(erlang:phash2({self(),make_ref(),time()})).
guid()-> random:seed(now()), MD5 = erlang:md5(term_to_binary({self(),time(),node(), now(), make_ref()})), MD5List = binary_to_list(MD5), F = fun(N) -> f("~2.16.0B", [N]) end, L = lists:flatten([F(N) || N <- MD5List]), %% tell our massive mnesia subscriber about this generation InstanceId = generate_instance_id(), mnesia_subscriber ! {self(),{key,write,L,timestamp(),InstanceId}}, {L,InstanceId}.
要进行非常多的并发写入,我们需要一个将由我们将生成的许多进程执行的函数。在这个函数中,它希望不将任何阻塞函数(例如sleep/1)通常实现为sleep(T)-> receive after T -> true end.。这样的函数会使进程执行挂起指定的毫秒数。 mnesia_tm执行锁定控制,重试,阻止,e.t.c。代表进程避免死锁。可以说,我们希望每个进程都写一个unlimited amount of records。这是我们的功能:

-define(NO_OF_PROCESSES,20).

start_write_jobs()->
    [spawn(?MODULE,generate_and_write,[]) || _ <- lists:seq(1,?NO_OF_PROCESSES)],
    ok.

generate_and_write()-> 
    %% remember that in the function ?MODULE:guid/0,
    %% we inform our mnesia_subscriber about our generated key
    %% together with the timestamp of the generation just before 
    %% a write is made.
    %% The subscriber will note this down in an ETS Table and then
    %% wait for mnesia Event about the write operation. Then it will
    %% take the event time stamp and calculate the time difference
    %% From there we can make judgement on performance. 
    %% In this case, we make the processes make unlimited writes 
    %% into our mnesia tables. Our subscriber will trap the events as soon as
    %% a successful write is made in mnesia
    %% For all keys we just write a Zero as its value
{Key,Instance} = guid(), write(#key_value{key = Key,value = 0,instanceId = Instance,pid = self()}), generate_and_write().

同样,让我们​​看看如何完成读取作业。 我们将有一个密钥提供商,这个密钥提供商一直在mnesia表周围旋转,只选择键,在桌子的上下都会保持旋转。这是它的代码:

first()-> mnesia:dirty_first(key_value).

next(FromKey)-> mnesia:dirty_next(key_value,FromKey).

start_key_picker()-> register(key_picker,spawn(fun() -> key_picker() end)).

key_picker()->
    try ?MODULE:first() of      
        '$end_of_table' -> 
            io:format("\n\tTable is empty, my dear !~n",[]),
            %% lets throw something there to start with
            ?MODULE:write(#key_value{key = guid(),value = 0}),
            key_picker();
        Key -> wait_key_reqs(Key)
    catch
        EXIT:REASON -> 
            error_logger:error_info(["Key Picker dies",{EXIT,REASON}]),
            exit({EXIT,REASON})
    end.

wait_key_reqs('$end_of_table')->
receive
    {From,<<"get_key">>} -> 
        Key = ?MODULE:first(),
        From ! {self(),Key},
        wait_key_reqs(?MODULE:next(Key));
    {_,<<"stop">>} -> exit(normal)
end;
wait_key_reqs(Key)->
receive
    {From,<<"get_key">>} -> 
        From ! {self(),Key},
        NextKey = ?MODULE:next(Key),
        wait_key_reqs(NextKey);
    {_,<<"stop">>} -> exit(normal)
end.

key_picker_rpc(Command)->
    try erlang:send(key_picker,{self(),Command}) of
        _ -> 
            receive
                {_,Reply} -> Reply
            after timer:seconds(60) -> 
                %% key_picker hang, or too busy
                erlang:throw({key_picker,hanged})
            end
    catch
        _:_ -> 
            %% key_picker dead
            start_key_picker(),
            sleep(timer:seconds(5)),
            key_picker_rpc(Command)
    end.

%% Now, this is where the reader processes will be
%% accessing keys. It will appear to them as though
%% its random, because its one process doing the 
%% traversal. It will all be a game of chance
%% depending on the scheduler's choice
%% he who will have the next read chance, will
%% win ! okay, lets get going below :)

get_key()-> 
    Key = key_picker_rpc(<<"get_key">>),

    %% lets report to our "massive" mnesia subscriber
    %% about a read which is about to happen
    %% together with a time stamp.
    Instance = generate_instance_id(),
    mnesia_subscriber ! {self(),{key,read,Key,timestamp(),Instance}},
    {Key,Instance}. 
哇!哇!现在我们需要创建一个我们将启动所有读者的功能。

-define(NO_OF_READERS,10).

start_read_jobs()->
    [spawn(?MODULE,constant_reader,[]) || _ <- lists:seq(1,?NO_OF_READERS)],
    ok.

constant_reader()->
    {Key,InstanceId} = ?MODULE:get_key(),
    Record = ?MODULE:read(Key),
    %% Tell mnesia_subscriber that a read has been done so it creates timestamp
    mnesia:report_event({read_success,Record,self(),InstanceId}),   
    constant_reader().

现在,最重要的部分; mnesia_subscriber !!!这是一个订阅的简单过程 简单的事件。从mnesia用户指南中获取mnesia事件文档。 这是mnesia订户

-record(read_instance,{
        instance_id,
        before_read_time,
        after_read_time,
        read_time       %% after_read_time - before_read_time

    }).

-record(write_instance,{
        instance_id,
        before_write_time,
        after_write_time,
        write_time          %% after_write_time - before_write_time
    }).

-record(benchmark,{
        id,         %% {pid(),Key}
        read_instances = [],
        write_instances = []
    }).

subscriber()->
    mnesia:subscribe({table,key_value, simple}),

    %% lets also subscribe for system
    %% events because events passing through
    %% mnesia:event/1 will go via
    %% system events. 

    mnesia:subscribe(system),
    wait_events().

-include_lib("stdlib/include/qlc.hrl").

wait_events()->
receive
    {From,{key,write,Key,TimeStamp,InstanceId}} -> 
        %% A process is just about to call
        %% mnesia:write/1 and so we note this down
        Fun = fun() -> 
                case qlc:e(qlc:q([X || X <- mnesia:table(benchmark),X#benchmark.id == {From,Key}])) of
                    [] -> 
                        ok = mnesia:write(#benchmark{
                                id = {From,Key},
                                write_instances = [
                                        #write_instance{
                                            instance_id = InstanceId,
                                            before_write_time = TimeStamp                                               
                                        }]
                                }),
                                ok;
                    [Here] -> 
                        WIs = Here#benchmark.write_instances,
                        NewInstance = #write_instance{
                                        instance_id = InstanceId,
                                        before_write_time = TimeStamp                                               
                                    },
                        ok = mnesia:write(Here#benchmark{write_instances = [NewInstance|WIs]}),
                        ok                          
                end
            end,
        mnesia:transaction(Fun),
        wait_events();      
    {mnesia_table_event,{write,#key_value{key = Key,instanceId = I,pid = From},_ActivityId}} ->
        %% A process has successfully made a write. So we look it up and 
        %% get timeStamp difference, and finish bench marking that write
        WriteTimeStamp = timestamp(),
        F = fun()->
                [Here] = mnesia:read({benchmark,{From,Key}}),
                WIs = Here#benchmark.write_instances,
                {_,WriteInstance} = lists:keysearch(I,2,WIs),
                BeforeTmStmp = WriteInstance#write_instance.before_write_time,
                NewWI = WriteInstance#write_instance{
                            after_write_time = WriteTimeStamp,
                            write_time = time_diff(WriteTimeStamp,BeforeTmStmp)
                        },
                ok = mnesia:write(Here#benchmark{write_instances = [NewWI|lists:keydelete(I,2,WIs)]}),
                ok
            end,
        mnesia:transaction(F),
        wait_events();      
    {From,{key,read,Key,TimeStamp,InstanceId}} ->
        %% A process is just about to do a read
        %% using mnesia:read/1 and so we note this down
        Fun = fun()-> 
                case qlc:e(qlc:q([X || X <- mnesia:table(benchmark),X#benchmark.id == {From,Key}])) of
                    [] -> 
                        ok = mnesia:write(#benchmark{
                                id = {From,Key},
                                read_instances = [
                                        #read_instance{
                                            instance_id = InstanceId,
                                            before_read_time = TimeStamp                                                
                                        }]
                                }),
                                ok;
                    [Here] -> 
                        RIs = Here#benchmark.read_instances,
                        NewInstance = #read_instance{
                                        instance_id = InstanceId,
                                        before_read_time = TimeStamp                                            
                                    },
                        ok = mnesia:write(Here#benchmark{read_instances = [NewInstance|RIs]}),
                        ok
                end
            end,
        mnesia:transaction(Fun),
        wait_events();
    {mnesia_system_event,{mnesia_user,{read_success,#key_value{key = Key},From,I}}} ->
        %% A process has successfully made a read. So we look it up and 
        %% get timeStamp difference, and finish bench marking that read
        ReadTimeStamp = timestamp(),
        F = fun()->
                [Here] = mnesia:read({benchmark,{From,Key}}),
                RIs = Here#benchmark.read_instances,
                {_,ReadInstance} = lists:keysearch(I,2,RIs),
                BeforeTmStmp = ReadInstance#read_instance.before_read_time,
                NewRI = ReadInstance#read_instance{
                            after_read_time = ReadTimeStamp,
                            read_time = time_diff(ReadTimeStamp,BeforeTmStmp)
                        },
                ok = mnesia:write(Here#benchmark{read_instances = [NewRI|lists:keydelete(I,2,RIs)]}),
                ok
            end,
        mnesia:transaction(F),
        wait_events();  
    _ -> wait_events();
end.

time_diff({A2,B2,C2} = _After,{A1,B1,C1} = _Before)->        
    {A2 - A1,B2 - B1,C2 - C1}.


好吧!那是巨大的:)所以我们完成了订阅者。我们需要将代码全部放在一起并运行必要的测试。

install()->
    mnesia:stop().
    mnesia:delete_schema([node()]),
    mnesia:create_schema([node()]),
    mnesia:start(),
    {atomic,ok} = mnesia:create_table(key_value,[
        {attributes,record_info(fields,key_value)},
        {disc_copies,[node()]}
]), {atomic,ok} = mnesia:create_table(benchmark,[ {attributes,record_info(fields,benchmark)}, {disc_copies,[node()]} ]), mnesia:stop(), ok.
start()-> mnesia:start(), ok = mnesia:wait_for_tables([key_value,benchmark],timer:seconds(120)), %% boot up our subscriber register(mnesia_subscriber,spawn(?MODULE,subscriber,[])), start_write_jobs(), start_key_picker(), start_read_jobs(), ok.

现在,通过对基准表记录的正确分析,您将获得平均读取时间的记录, 平均写入时间e.t.c.您可以根据不断增加的进程数绘制这些时间的图表。 随着我们增加进程数量,您将发现读取和写入时间增加 。获取代码,阅读并使用它。你可能不会全部使用它,但我相信你可以接受 那里的新概念和其他人一样在那里发送解决方案。使用mnesia事件是测试mnesia读写的最佳方法,而不会阻止进行实际写入或读取的进程。在上面的示例中,读取和写入过程不受任何控制,事实上,它们将永远运行,直到您终止VM。您可以使用良好的公式遍历基准表,以利用每个读取或写入实例的读取和写入时间,然后计算平均值,变量e.t.c.

<小时/>

远程计算机测试,模拟节点,针对其他DBMS进行基准测试可能因为许多原因而无法相关。 Mnesia的概念,动机和目标与几种类型的现有数据库类型非常不同,例如:面向文档的DB,RDBMS,面向对象的数据库e.t.c.事实上,mnesia将与数据库比较,例如 http://www.berabera.info/oldblog/lenglet/howtos/erlangkerberosremctl/index.html 。它是一个分布式DBM,具有属于语言Erlang的混合/非结构化数据结构。将Mnesia与其他类型的数据库进行基准比较可能不正确,因为它的目的与许多人及其与Erlang / OTP的紧密耦合有很大不同。但是,有关mnesia如何工作,事务上下文,索引,并发,分发的知识可以成为良好的数据库设计的关键。 Mnesia可以存储非常复杂的数据结构。请记住,数据结构与嵌套信息越复杂,解压缩和提取运行时所需信息所需的工作就越多,这意味着更多的CPU周期和内存。有时,使用mnesia进行标准化可能只会导致性能不佳,因此其概念的实现远离其他数据库。
你很高兴你对几台机器(分布式)的Mnesia性能感兴趣,但是,性能和分布式Erlang一样好。最棒的是每个交易都确保了原子性。来自远程节点的并发请求仍然可以通过RPC调用发送。请记住,如果在不同的计算机上有多个mnesia副本,则每个节点上运行的进程将在该节点上写入,然后mnesia将从那里继续进行复制。 Mnesia复制速度非常快,除非网络真的坏了和/或节点没有连接,或者网络在运行时被分区。
Mnesia确保CRUD操作的一致性和原子性。因此,复制的mnesia数据库高度依赖于网络可用性以获得更好的性能。只要Erlang节点保持连接,两个或多个Mnesia节点将始终具有相同的数据。在一个节点上读取将确保您获得最新信息。当发生断开连接并且每个节点将其注册为另一个节点时,会出现问题。有关mnesia表现的更多信息,请参阅以下链接

this one
http://igorrs.blogspot.com/2010/05/mnesia-one-year-later.html
http://igorrs.blogspot.com/2010/05/mnesia-one-year-later-part-2.html
http://igorrs.blogspot.com/2010/05/mnesia-one-year-later-part-3.html

因此,mnesia背后的概念只能与爱立信的NDB数据库进行比较: http://igorrs.blogspot.com/2009/11/consistent-hashing-for-mnesia-fragments.html ,但不能与现有的RDBMS进行比较,或面向文档的数据库等这些是我的想法:)让我们等待其他人的话......

答案 1 :(得分:0)

使用如下命令启动其他节点:

erl -name test1@127.0.0.1 -cookie devel \
    -mnesia extra_db_nodes "['devel@127.0.0.1']"\
    -s mnesia start

其中'devel@127.0.0.1'是已经设置了mnesia的节点。在这种情况下,将从远程节点访问所有表,但您可以使用mnesia:add_table_copy/3制作本地副本。

然后,您可以使用spawn/2spawn/4在所有节点上开始生成负载,例如:

lists:foreach(fun(N) ->
                  spawn(N, fun () ->
                               %% generate some load
                               ok
                           end
              end,
     [ 'test1@127.0.0.1', 'test2@127.0.0.1' ]
)