无法在主机上启动波束并且无法在主机上启动新波束

时间:2012-07-16 16:11:02

标签: amazon-ec2 erlang load-testing tsung beam

遇到以下问题:

==> 20120712-1611/tsung_controller@tester0.log <==

=INFO REPORT==== 12-Jul-2012::16:12:45 ===
   ts_config_server:(0:<0.100.0>) Can't start newbeam on host tester1 (reason: timeout) ! Aborting!

=INFO REPORT==== 12-Jul-2012::16:12:45 ===
   ts_config_server:(0:<0.99.0>) Can't start newbeam on host tester2 (reason: timeout) ! Aborting!

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.74.0>) Fail to start beam on host "web1-1b" ({error,
                                    timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.74.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.372>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"web1-1b",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"web1-1b",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.82.0>) Fail to start beam on host "master3" ({error,
                                    timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.82.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.405>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"master3",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"master3",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.80.0>) Fail to start beam on host "master1" ({error,
                                    timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.80.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.397>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"master1",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"master1",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.81.0>) Fail to start beam on host "master2" ({error,
                                    timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.81.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.400>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"master2",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"master2",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.78.0>) Fail to start beam on host "memcache-1a" ({error,
                                        timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.78.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.386>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"memcache-1a",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"memcache-1a",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.79.0>) Fail to start beam on host "memcache-1b" ({error,
                                        timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.79.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.392>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"memcache-1b",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"memcache-1b",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.76.0>) Fail to start beam on host "task1" ({error,
                                      timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.76.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.374>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"task1",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"task1",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.77.0>) Fail to start beam on host "ffmpeg1" ({error,
                                    timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.77.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.380>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"ffmpeg1",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"ffmpeg1",
                                   {},10000,
                                   {global,
                                    ts_mon}}
=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(3:<0.73.0>) Fail to start beam on host "web1-1a" ({error,
                                    timeout})

=ERROR REPORT==== 12-Jul-2012::16:12:46 ===
** Generic server <0.73.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.364>,start_beam}
** When Server state == {state,{global,ts_mon},
                  10000,undefined,"web1-1a",undefined}
** Reason for termination == 
** {error,timeout}

=INFO REPORT==== 12-Jul-2012::16:12:46 ===
   ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"web1-1a",
                                   {},10000,
                                   {global,
                                    ts_mon}}

这是我的/ etc / hosts文件(tester0):

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

10.0.3.192 vm0
10.0.3.199 vm1
10.0.3.242 vm2

10.100.238.56 master1
10.90.245.66 master2
10.70.78.51master3
10.37.53.46 web1-1a
10.94.245.79 web1-1b
10.127.46.19 task1
10.35.99.161 ffmpeg1
10.243.63.212 memcache-1a
10.223.50.72 memcache-1b
10.29.155.171 tester0
10.78.159.23 tester1
10.78.149.115 tester2

每当我启动实例(并且它们都具有相同版本的Erlang,我是从源代码构建它)时,我运行此脚本:

#!/bin/bash

for i in web1-1a web1-1b task1 ffmpeg1 master1 master2 master3 memcache-1a memcache-1b tester1 tester2; do
    ssh $i -i ~/.ssh/amazon-key.pem "echo \"<MY PUB SSH KEY IN HERE>" | tee -a ~/.ssh/authorized_keys; ssh-keygen -t rsa << hereintime



hereintime; sudo hostname $i; exit" &> /dev/null
    ssh $i "echo \"host *
    user <myuser>
    StrictHostKeyChecking no\" | tee -a .ssh/config; sudo sed -i.bak -e \"s/localhost/localhost $i/\" -e \"/$i/d\" /etc/hosts; echo \"# need to have ssh-agent running
eval \`ssh-agent\`
[ -e /home/<myuser>/.ssh/id_rsa.pub ] && ssh-add\" | tee -a ~/.bashrc" &> /dev/null
    newhostline=`grep $i /etc/hosts`
    ssh $i "sudo sed -i -e \"/$i/d\" /etc/hosts; echo $newhostline | sudo tee -a /etc/hosts" &> /dev/null
    [ "${i:0:-1}" == "tester" ] && tester0=`grep tester0 /etc/hosts` && ssh $i "sudo sed -i -e '/tester0/d' /etc/hosts" &> /dev/null
    ssh $i "rm ~/.ssh/known_hosts; echo $tester0 | sudo tee -a /etc/hosts; ssh tester0 \"exit\""  &> /dev/null
    ssh $i "cat ~/.ssh/id_rsa.pub" | tee -a ~/.ssh/authorized_keys
    ssh $i "sudo hostname $i; exit"
done

我完全能够运行您在文档中说明的测试(例如):

# ssh tester1 erl
Eshell V5.9.1  (abort with ^G)
1> inet:gethostname().
{ok,"tester1"}

和页面中描述的内容:https://support.process-one.net/doc/display/ERL/Starting+a+set+of+Erlang+cluster+nodes

-module(cluster).
-export([slaves/1]).

%% Argument:
%% %% Hosts: List of hostname (string)
slaves([]) ->
ok;
slaves([Host|Hosts]) ->
 Args = erl_system_args(),
 NodeName = "cluster",
 {ok, Node} = slave:start_link(Host, NodeName, Args),
 io:format("Erlang node started = [~p]~n", [Node]),
 slaves(Hosts).

erl_system_args()->
 Shared = case init:get_argument(shared) of
   error -> " ";
   {ok,[[]]} -> " -shared "
 end,
   lists:append(["-rsh ssh -setcookie",
         atom_to_list(erlang:get_cookie()),
         Shared, " +Mea r10b "]).

%% Do not forget to start erlang with a command like:
%% erl -rsh ssh -sname clustmaster

然后我跑(在tester0上):

# erl -rsh ssh -sname clustmaster
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [async-threads:0] [kernel-poll:false]

Eshell V5.9.1  (abort with ^G)
(clustmaster@tester0)1> c(cluster).
{ok,cluster}
(clustmaster@tester0)2> cluster:slaves(["tester1","tester2"]).
** exception error: no match of right hand side value {error,timeout}
    in function  cluster:slaves/1 (cluster.erl, line 11)
(clustmaster@tester0)3> cluster:slaves(["tester0"]).          
Erlang node started = [cluster@tester0]
ok

这是有道理的,因为:

(clustmaster@tester0)14> slave:start_link("tester0", "cluster", " -rsh ssh -setcookieVTJKCGTPGNTMRAUDYLBU +Mea r10b").   
{ok,cluster@tester0}
(clustmaster@tester0)15> slave:start_link("tester1", "cluster", " -rsh ssh -setcookieVTJKCGTPGNTMRAUDYLBU +Mea r10b").
{error,timeout}

奇???

(clustmaster@tester0)5> inet:gethostbyname("tester1").
{ok,{hostent,"tester1",[],inet,4,[{10,78,159,23}]}}
(clustmaster@tester0)6> inet:gethostbyname("tester2").
{ok,{hostent,"tester2",[],inet,4,[{10,78,149,115}]}}

# ping -c 1 tester1
PING tester1 (10.78.159.23) 56(84) bytes of data.
64 bytes from tester1 (10.78.159.23): icmp_req=1 ttl=56 time=1.69 ms
# ping -c 1 tester2
PING tester2 (10.78.149.115) 56(84) bytes of data.
64 bytes from tester2 (10.78.149.115): icmp_req=1 ttl=56 time=2.03 ms

1 个答案:

答案 0 :(得分:1)

我的脚本发现了一个问题,即将ssh-key导入远程服务器。我正在使用:

sudo sed -i.bak -e \"s/localhost/localhost $i/\" -e \"/$i/d\" /etc/hosts;

[ "${i:0:-1}" == "tester" ] && tester0=`grep tester0 /etc/hosts` && ssh $i "sudo sed -i -e '/tester0/d' /etc/hosts" &> /dev/null

从服务器上删除了我的“控制器”的地址。现在它已被删除,我可以:

slave:start_linke(Host,Name,Args)