无法使DataStax OpsCenter 4.0.3连接到Cassandra 2.0.4集群

时间:2014-01-30 06:54:10

标签: cassandra datastax opscenter

我有一个1节点的C * 2.0.4集群正在运行,而nodetool状态显示了一个健康的集群。

然后我使用'sudo yum install opscenter-free'在同一网络上的另一台机器上安装OpsCenter 4.0.3。

在opscenterd.conf文件中,我设置了interface ='OpsCenter服务器的公共IP'并启动了OpsCenter服务器。

然后我能够看到OpsCenter网页并单击“使用现有群集”。

在Add Cluster接口下,我输入了单节点Cassandra集群的rpc_address。 OpsCenter接受了它,并在下一页正确显示了群集名称。

但是,OpsCenter中没有任何图表加载,我看到错误:连接了0个代理中的0个。我还看到一个闪烁的红色X与顶部的插头图标。

在OpsCenter和C *节点上,CentOS中的防火墙当前已关闭。

如何让OpsCenter正确连接到C *节点?

以下是OpsCenter日志显示的内容(注意:我用A.B.C.D替换了IP):

2014-01-30 06:43:37+0000 [Dog]  WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog]  WARN: HTTP request http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D failed: Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog]  WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.

在Cassandra节点上,一切看起来都很健康:

[root@cassandra01 ~]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  A.B.C.D  158.27 KB  256     100.0%  bc560cd6-a20d-4b36-99ca-ed477dc939b5  rack1

但是,我无法卷曲OpsCenter试图访问的网址:

[root@cassandra01 ~]# curl http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D
curl: (7) couldn't connect to host

OpsCenterd.conf文件中的SSL已关闭(默认设置)。

以下是我在以下URL中看到的内容http:// OpsCenter的公共IP :8888 / Dog / nodes

[{"load": null, "has_jna": false, "vnodes": true, "devices": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "task_progress": {}, "node_ip": "A.B.C.D", "network_interfaces": null, "ec2": {}, "node_version": {}, "dc": null, "node_name": null, "num_procs": null, "streaming": {}, "token": "5743408169174478324", "data_held": null, "mode": "unknown", "rpc_ip": "10.183.132.141", "partitions": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "os": null, "rack": null, "last_seen": 0}]

有关如何解决此问题的任何想法?

注意,在Cassandra的YAML文件中,rpc_server_type设置为sync。


更新

我还尝试使用“yum install datastax-agent”在C *节点上手动安装OpsCenter代理,然后使用以下设置编辑address.yaml文件:

stomp_interface: 'public ip of machine opscenterd is running on (public IP)'
local_interface: 'listen_address in cassandra.yaml (public IP)'
agent_rpc_interface: 'rpc address in cassandra.yaml (private IP network)'
agent_rpc_broadcast_address: 'private network IP, same network at rpc address'

我为address.yaml文件尝试了一些不同的设置,但没有一个工作。例如,我试图只设置stop_interface并删除其他3行。没工作。我也尝试设置只是停止和本地接口,但也没有用。

当我现在使用'service datastax-agent start'启动datastax代理时,突然Cassandra服务崩溃了:

[root @cassandra01~] #sudo service cassandra status cassandra死但pid文件存在

当C *服务崩溃时,opscenter代理保持运行状态。如果我停止代理服务并再次启动C *服务(sudo service cassandra status),则C *将成功启动,nodetool状态将显示正常的1节点集群。但是一旦我启动代理服务,C *服务就会突然崩溃。我在address.yaml文件中尝试的所有不同设置都会导致同样的行为。

理想情况下,我不想手动安装代理,并且只想将其安装程序从OpsCenter GUI推送到C *节点,但由于这不起作用,我尝试手动安装代理并将其连接到OpsCenter,但不幸的是,这不起作用。

当Cassandra服务崩溃时,我有时会在Cassandra节点上看到这个: [root @cassandra01~] #sudo service cassandra stoplog4j:WARN找不到logger(org.eclipse.jetty.util.log)的appender。 log4j:WARN请正确初始化log4j系统。 log4j:WARN有关详细信息,请参阅http://logging.apache.org/log4j/1.2/faq.html#noconfig。 用法:cassandra start | stop | status | restart | reload

以下是Cassandra节点的log4j-server.properties所包含的内容:

log4j.rootLogger=INFO,stdout,R

# stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p %d{HH:mm:ss,SSS} %m%n

# rolling log file
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.maxFileSize=20MB
log4j.appender.R.maxBackupIndex=50
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n
# Edit the next line to point to your logs directory
log4j.appender.R.File=/var/log/cassandra/system.log

# Application logging options
#log4j.logger.org.apache.cassandra=DEBUG
#log4j.logger.org.apache.cassandra.db=DEBUG
#log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG

# Adding this to avoid thrift logging disconnect errors.
log4j.logger.org.apache.thrift.server.TNonblockingServer=ERROR

最后,这是来自Cassandra节点上运行的OpsCenter代理的agent.log显示的内容:

nohup: ignoring input
Starting DataStax agent monitor datastax_agent_monitor
 INFO [main] 2014-01-30 08:24:59,104 Loading conf files: /var/lib/datastax-agent /conf/address.yaml
 INFO [main] 2014-01-30 08:24:59,261 Java vendor/version: Java HotSpot(TM) 64-Bi t Server VM/1.7.0_25
 INFO [main] 2014-01-30 08:24:59,546 Default config values: {:rollups300_ttl 241 9200, :settings_cf "settings", :agent_rpc_interface "10.183.132.141", :my_channe l_prefix "/agent", :poll_period 60, :kerberos_hostname nil, :storage_dc nil, :th rift_conn_timeout 10000, :thrift_max_frame_size 15728640, :rollups60_ttl 604800,  :stomp_port 61620, :shorttime_interval 10, :longtime_interval 300, :private-con f-props ["initial_token" "listen_address" "broadcast_address" "rpc_address"], :t hrift_port 9160, :async_retry_timeout 5, :agent-conf-group "global-cluster-agent -group", :jmx_host "127.0.0.1", :ec2_metadata_api_host "169.254.169.254", :metri cs_enabled 1, :async_queue_size 5000, :autodiscovery_interval 120, :rollups7200_ ttl 31536000, :autodiscovery_enabled true, :thrift_ssl_truststore nil, :rollup_s napshot_period 300, :is_package true, :monitor_command "/usr/share/datastax-agen t/bin/datastax_agent_monitor", :thrift_socket_timeout 5000, :cassandra_log_locat ion "/var/log/cassandra/system.log", :local_interface "23.253.64.169", :jmx_port  7199, :jmx_metrics_threadpool_size 4, :use_ssl 0, :rollups86400_ttl -1, :nodede tails_threadpool_size 3, :api_port 61621, :kerberos_service nil, :kerberos_clien t_principal nil, :jmx_thread_pool_size 5, :production 1, :stomp_interface "166.7 8.186.184", :storage_keyspace "OpsCenter", :rollup_snapshot_threshold 300, :thri ft_ssl_truststore_type "JKS", :realtime_interval 5}
 INFO [main] 2014-01-30 08:24:59,554 Waiting for the config from OpsCenter
 INFO [main] 2014-01-30 08:24:59,559 Using 23.253.64.169 as the cassandra broadc ast address
 INFO [main] 2014-01-30 08:24:59,568 New JMX connection (127.0.0.1:7199)
ERROR [main] 2014-01-30 08:25:00,019 Error connecting via JMX: java.io.IOExcepti on: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException  [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0. 0.1; nested exception is:
        java.net.ConnectException: Connection refused]
 INFO [main] 2014-01-30 08:25:00,414 cassandra RPC address is  nil
 INFO [main] 2014-01-30 08:25:00,418 agent RPC broadcast address is  10.183.132. 141
 INFO [main] 2014-01-30 08:25:00,474 Clearing ssl.truststore
 INFO [main] 2014-01-30 08:25:00,475 Clearing ssl.truststore.password
 INFO [main] 2014-01-30 08:25:00,476 Setting ssl.store.type to JKS
 INFO [main] 2014-01-30 08:25:00,477 Clearing kerberos.service.principal.name
 INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.principal
 INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.useTicketCache
 INFO [main] 2014-01-30 08:25:00,481 Clearing kerberos.ticketCache
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.useKeyTab
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.keyTab
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.renewTGT
 INFO [main] 2014-01-30 08:25:00,488 Clearing kerberos.debug
 INFO [main] 2014-01-30 08:25:00,495 Starting Stomp
 INFO [main] 2014-01-30 08:25:00,495 SSL communication is disabled
 INFO [main] 2014-01-30 08:25:00,495 Creating stomp connection to 166.78.186.184 :61620
 INFO [thrift-init] 2014-01-30 08:25:00,521 Connecting to Cassandra cluster: 23. 253.64.169 (port 9160)
 INFO [StompConnection receiver] 2014-01-30 08:25:00,536 Reconnecting in 0s.
 INFO [StompConnection receiver] 2014-01-30 08:25:00,561 Connected to 166.78.186 .184:61620
 INFO [thrift-init] 2014-01-30 08:25:00,619 Downed Host Retry service started wi th queue size -1 and retry delay 10s
 INFO [thrift-init] 2014-01-30 08:25:00,662 Registering JMX me.prettyprint.cassa ndra.service_Agent Cluster:ServiceType=hector,MonitorType=hector
 INFO [main] 2014-01-30 08:25:00,732 Starting Jetty server: {:port 61621, :host  "10.183.132.141", :ssl? false, :join? false}
ERROR [thrift-init] 2014-01-30 08:25:00,885 MARK HOST AS DOWN TRIGGERED for host  23.253.64.169(23.253.64.169):9160
ERROR [thrift-init] 2014-01-30 08:25:00,886 Pool state on shutdown: <ConcurrentC assandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}; IsActive?: true;  Active: 0; Blocked: 1; Idle: 0; NumBeforeExhausted: 1
 INFO [thrift-init] 2014-01-30 08:25:00,887 Shutdown triggered on <ConcurrentCas sandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
 INFO [thrift-init] 2014-01-30 08:25:00,901 Shutdown complete on <ConcurrentCass andraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
 INFO [thrift-init] 2014-01-30 08:25:00,902 Host detected as down was added to r etry queue: 23.253.64.169(23.253.64.169):9160
 WARN [thrift-init] 2014-01-30 08:25:00,914 Could not fullfill request on this h ost null
 WARN [Hector.me.prettyprint.cassandra.connection.CassandraHostRetryService-1] 2 014-01-30 08:25:00,910 Downed 23.253.64.169(23.253.64.169):9160 host still appea rs to be down: Unable to open transport to 23.253.64.169(23.253.64.169):9160 , j ava.net.ConnectException: Connection refused
 WARN [thrift-init] 2014-01-30 08:25:00,926 Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open tr ansport to 23.253.64.169(23.253.64.169):9160 , java.net.ConnectException: Connec tion refused
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:180)
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:38)
        at me.prettyprint.cassandra.connection.ConcurrentHClientPool.createClien t(ConcurrentHClientPool.java:162)
        at me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClien t(ConcurrentHClientPool.java:94)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:250)
        at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
        at clj_hector.core$cluster_name.invoke(core.clj:40)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectExce ption: Connection refused
        at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
        at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja va:81)
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:174)
        ... 16 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
        ... 18 more
ERROR [thrift-init] 2014-01-30 08:25:00,965 Error when performing thrift operati on:
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down . Retry burden pushed out to client.
        at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromL BPolicy(HConnectionManager.java:395)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:249)
        at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
        at clj_hector.core$cluster_name.invoke(core.clj:40)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,024 Got new config from Ops Center: {:kerberos_use_keytab true, :rollups300_ttl 2419200, :kerberos_use_ticke t_cache true, :rollups60_ttl 604800, :thrift_port 9160, :ec2_metadata_api_host " 169.254.169.254", :metrics_enabled 1, :rollups7200_ttl 31536000, :thrift_ssl_tru ststore nil, :metrics_ignored_column_families "", :cassandra_log_location "/var/ log/cassandra/system.log", :thrift_rpc_interface "10.183.132.141", :thrift_ssl_t ruststore_password nil, :jmx_port 7199, :provisioning 0, :use_ssl 0, :kerberos_d ebug false, :rollups86400_ttl -1, :api_port "61621", :storage_keyspace "OpsCente r", :kerberos_renew_tgt true, :metrics_ignored_solr_cores "", :thrift_ssl_trusts tore_type "JKS", :metrics_ignored_keyspaces "system, system_traces, system_auth,  dse_auth, OpsCenter", :rollup_subscriptions [], :cassandra_install_location ""}
 INFO [StompConnection receiver] 2014-01-30 08:25:01,030 Starting up agent colle ction.
 INFO [StompConnection receiver] 2014-01-30 08:25:01,040 New JMX connection (127 .0.0.1:7199)
ERROR [StompConnection receiver] 2014-01-30 08:25:01,073 Error connecting via JM X: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceU navailableException [Root exception is java.rmi.ConnectException: Connection ref used to host: 127.0.0.1; nested exception is:
        java.net.ConnectException: Connection refused]
 INFO [Jetty] 2014-01-30 08:25:01,160 Jetty server started
 INFO [StompConnection receiver] 2014-01-30 08:25:01,188 Starting OS metric coll ectors (Linux)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,199 Starting Cassandra JMX  metric collectors
 INFO [install-location-finder] 2014-01-30 08:25:01,250 New JMX connection (127. 0.0.1:7199)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,252 New JMX connection (127 .0.0.1:7199)
ERROR [install-location-finder] 2014-01-30 08:25:01,261 Error connecting via JMX : java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUn availableException [Root exception is java.rmi.ConnectException: Connection refu sed to host: 127.0.0.1; nested exception is:
        java.net.ConnectException: Connection refused]

2 个答案:

答案 0 :(得分:1)

此论坛帖子似乎正在捕捉此设置可能发生的一些问题:

http://www.datastax.com/support-forums/topic/opscenter-agent-not-connecting-to-opscenter

答案 1 :(得分:1)

/var/log/messages

中存在内存不足错误
Jan 30 20:06:39 hostname kernel: Out of memory: Kill process 2900 (java) score 788 or sacrifice child
Jan 30 20:06:39 hostname kernel: Killed process 2900, UID 0, (java) total-vm:1383360kB, anon-rss:717176kB, file-rss:113316kB

我使用的是相同的设置。 cassandra-env.sh脚本正在使用

获取可用内存量
system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`

但是,在这个系统上(Linux主机名2.6.32-358.23.2.el6.x86_64#1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64 x86_64 x86_64 GNU / Linux),free -m免费记忆在第4栏。将上述内容更改为

system_memory_in_mb=`free -m | awk '/Mem:/ {print $4}'`

并且保存允许Cassandra在没有内存不足的情况下启动。