我在mapReduce algorithem的课程中工作,所以我在一个大数据文件中在Erlang中构建了一个ets表,我想同时处理它。 结果表非常大,我想知道是否有办法将一个大表拆分成几个较小的表,以便我可以同时搜索表 使用mapReduce算法,有没有办法将一个大表分成子表? 日Thnx。
答案 0 :(得分:1)
您可以同时搜索ETS表,而无需拆分表:
http://www.erlang.org/doc/man/ets.html#new_2_read_concurrency
如果表格很大,我建议你使用一个好的匹配模式来帮助减少搜索量:http://www.erlang.org/doc/man/ets.html#select-2
答案 1 :(得分:1)
我曾在一个内部网应用程序上工作过,我必须在大多数时间内将内容保存在RAM中。我创建了一个稳定的caching library
,帮助我抽象出ETS
机制。在这个库中,我创建了worker gen_servers
,其工作是创建,拥有和公开ETS
表的方法。我将它们命名为:cache1
和cache2
。这两个人以冗余的方式继续将所有权转让给对方,以防其中一个人遇到问题。获取申请:http://www.4shared.com/zip/z_VgKLpa/cache-10.html
只需将其解压缩并使用Emake file
重新编译它,然后将其放入Erlang Lib directory
中。为了查看它是如何工作的,这里有一个shell插件。
F:\programming work\cache-1.0>erl -pa ebin Eshell V5.9 (abort with ^G) 1> application:start(cache). ok 2> rd(student,{name,age,sex}). student 3> cache_server:new(student,set,2). ok 4> cache_server:write(#student{name = "Muzaaya Joshua", sex = "Male",age = (2012 - 1987) }). ok 5> cache_server:write(student,[#student{name = "Joe",sex = "Male"}, #student{name = "Mike",sex = "Male"}]). ok 6> cache_server:read({student,"Muzaaya Joshua"}). [#student{name = "Muzaaya Joshua",age = 25,sex = "Male"}] 7> cache_server:read({student,"Joe"}). [#student{name = "Joe",age = undefined,sex = "Male"}] 8> cache_server:get_tables(). [{cache1,[student]},{cache2,[]}] 9> rd(class,{class,no_of_students}). class 10> cache_server:get_tables(). [{cache1,[student]},{cache2,[]}] 11> cache_server:new(class,set,2). ok 12> cache_server:get_tables(). [{cache1,[student]},{cache2,[class]}] 13> cache_server:write(class,[ #class{class = "Primary " ++ integer_to_list(N), no_of_students = random:uniform(50)} || N <- lists:seq(1,7)]) . ok 14> cache_server:read({class,"Primary 6"}). [#class{class = "Primary 6",no_of_students = 30}] 15> cache_server:delete({class,"Primary 2"}). ok 16> cache_server:get_cache_state(). [{server_state,cache1,1,[student]}, {server_state,cache2,1,[class]}] 17> rd(food,{name,type,value}). food 18> cache_server:new(food,set,2). ok 19> cache_server:write(food,[#food{name = "Orange", type = "fruit",value = "Vitamin C"}]). ok 20> cache_server:get_cache_state(). [{server_state,cache1,2,[food,student]}, {server_state,cache2,1,[class]}] 21>现在,要了解
ets:give_away/3
的重要性,让我们看看当cache1
或cache2
崩溃时会发生什么。请记住,当前服务器状态(显示表的当前所有者)是:21> cache_server:get_cache_state(). [{server_state,cache1,2,[food,student]}, {server_state,cache2,1,[class]}] 22>让我崩溃
cache1
,我们看到了。 22> gen_server:cast(cache1,stop). ok Cache Server: cache2 has taken over table: food from server: cache1 23> Cache Server: cache2 has taken over table: student from server: cache1 23> cache_server:get_cache_state(). [{server_state,cache1,0,[]}, {server_state,cache2,3,[student,food,class]}] 24>另外一个:
24> gen_server:cast(cache2,stop). ok Cache Server: cache1 has taken over table: student from server: cache2 25> Cache Server: cache1 has taken over table: food from server: cache2 25> Cache Server: cache1 has taken over table: class from server: cache2 25> cache_server:get_cache_state(). [{server_state,cache1,3,[class,food,student]}, {server_state,cache2,0,[]}] 26>而已 !您可以使用源代码中的概念来创建自己的东西。该库创建的
ETS
表格为public
和named
,因此您可以使用ETS
函数直接访问它们。