我最近安装在三节点集群DSE 4.8.9上。集群运行良好,健康。我们已经开始从1400万中删除400万条记录,因此创建了一些墓碑。 重新启动其中一个节点(10.0.106.7)后,该节点不再显示端口9042,因此无法通过cqlsh进行连接。端口7199暴露。
机器设置:150 GB数据驱动器,单独光盘上10 GB提交日志,32 GB RAM,8个核心
我在system.log中观察到相当高的GC活动。你可以在pastebin上看到system.log
这似乎导致c *不再暴露端口9042。
我试图让机器在这种状态下运行几个小时。系统日志保持活动状态,但cqlsh无法连接。我收到了
连接错误:('无法连接到任何服务器',{'10 .0.106.7':错误(111,“尝试连接到[('10 .0.106.7',9042)]。上次错误:连接被拒绝”) })
从同一主机连接到cqlsh。
这导致问题,opscenter无法连接到此实例。
有什么建议可以让cqlsh回到这个节点上吗?
更新: nodetool status的输出:
UN 10.0.106.5 80.46 GB 1 ? ec3f6f84-41bc-4ae5-85a1-59df023308a7 rack1 UN 10.0.106.6 67.02 GB 1 ? 47388e88-6079-4926-95a6-f4e7627a2037 rack1 UN 10.0.106.7 87.47 GB 1 ? 651c6633-0948-499c-8f7a-98041c87cfb2 rack1
UPDATE2:
netstat -an | grep 9042 什么都不返回
摘自output.log
INFO 20:37:37,178 Loading settings from file:/etc/dse/cassandra/cassandra.yaml INFO 20:37:37,287 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=64; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=; cluster_name=Gjallarhorn-Public-Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/srv/commitLog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=8192; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/srv/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=com.datastax.bdp.snitch.DseSimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=3074457345618258602; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=10.0.106.7; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=1; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=10.0.106.7; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/srv/cassandra/saved_caches; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=10.0.106.5,10.0.106.6,10.0.106.7}]}]; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; write_request_timeout_in_ms=2000] INFO 20:37:37,345 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 20:37:37,402 Global memtable on-heap threshold is enabled at 5632MB INFO 20:37:37,402 Global memtable off-heap threshold is enabled at 5632MB INFO 20:37:37,406 Detected search service is enabled, setting my workload to Search INFO 20:37:37,407 Detected search service is enabled, setting my DC to Solr INFO 20:37:37,408 Initialized DseDelegateSnitch with workload Search, delegating to com.datastax.bdp.snitch.DseSimpleSnitch
ps -ef | grep cassandra
/usr/lib/jvm/java-8-oracle/jre//bin/java -Ddse.system_memory_in_mb=29450 -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -Ddse.system_memory_in_mb=29450 -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -ea -javaagent:/usr/share/dse/cassandra/lib/jamm-0.3.0.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms22G -Xmx22G -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+ResizeTLAB -XX:CompileCommandFile=/etc/dse/cassandra/hotspot_compiler -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=1000 -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/dse/dse.pid -cp :/usr/share/dse/dse-core-4.8.9.jar:/usr/share/dse/dse-hadoop-4.8.9.jar:/usr/share/dse/dse-hive-4.8.9.jar:/usr/share/dse/dse-search-4.8.9.jar:/usr/share/dse/dse-spark-4.8.9.jar:/usr/share/dse/dse-sqoop-4.8.9.jar:/usr/share/dse/common/HdrHistogram-1.2.1.1.jar:/usr/share/dse/common/antlr-2.7.7.jar:/usr/share/dse/common/antlr-3.2.jar:/usr/share/dse/common/antlr-runtime-3.2.jar:/usr/share/dse/common/aopalliance-1.0.jar:/usr/share/dse/common/api-asn1-api-1.0.0-M33.jar:/usr/share/dse/common/api-asn1-ber-1.0.0-M33.jar:/usr/share/dse/common/api-i18n-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-client-api-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-codec-core-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-codec-standalone-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-aci-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-codec-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-codec-api-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-model-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-net-mina-1.0.0-M33.jar:/usr/share/dse/common/api-util-1.0.0-M33.jar:/usr/share/dse/common/asm-5.0.3.jar:/usr/share/dse/common/commons-beanutils-1.9.2.jar:/usr/share/dse/common/commons-codec-1.9.jar:/usr/share/dse/common/commons-collections-3.2.2.jar:/usr/share/dse/common/commons-compiler-2.6.1.jar:/usr/share/dse/common/commons-configuration-1.6.jar:/usr/share/dse/common/commons-digester-1.8.jar:/usr/share/dse/common/commons-io-2.4.jar:/usr/share/dse/common/commons-lang-2.6.jar:/usr/share/dse/common/commons-logging-1.1.1.jar:/usr/share/dse/common/commons-pool-1.6.jar:/usr/share/dse/common/guava-16.0.1.jar:/usr/share/dse/common/guice-3.0.jar:/usr/share/dse/common/guice-multibindings-3.0.jar:/usr/share/dse/common/jackson-annotations-2.2.2.jar:/usr/share/dse/common/jackson-core-2.2.2.jar:/usr/share/dse/common/jackson-databind-2.2.2.jar:/usr/share/dse/common/janino-2.6.1.jar:/usr/share/dse/common/java-uuid-generator-3.1.3.jar:/usr/share/dse/common/javassist-3.18.2-GA.jar:/usr/share/dse/common/javax.inject-1.jar:/usr/share/dse/common/jbcrypt-0.4d.jar:/usr/share/dse/common/jcl-over-slf4j-1.7.10.jar:/usr/share/dse/common/jline-1.0.jar:/usr/share/dse/common/journalio-1.4.2.jar:/usr/share/dse/common/jsr305-2.0.1.jar:/usr/share/dse/common/kmip-1.7.1e.jar:/usr/share/dse/common/log4j-1.2.13.jar:/usr/share/dse/common/mina-core-2.0.10.jar:/usr/share/dse/common/org.apache.servicemix.bundles.antlr-2.7.7_5.jar:/usr/share/dse/common/reflections-0.9.10.jar:/usr/share/dse/common/slf4j-api-1.7.10.jar:/usr/share/dse/common/stringtemplate-3.2.jar:/usr/share/dse/common/validation-api-1.1.0.Final.jar:/etc/dse:/etc/dse/cassandra:/usr/share/dse/cassandra/tools/lib/stress.jar:/usr/share/dse/cassandra/lib/ST4-4.0.8.jar:/usr/share/dse/cassandra/lib/antlr-3.5.2.jar:/usr/share/dse/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/dse/cassandra/lib/cassandra-all-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/cassandra-clientutil-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/cassandra-thrift-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/commons-cli-1.1.jar:/usr/share/dse/cassandra/lib/commons-codec-1.9.jar:/usr/share/dse/cassandra/lib/commons-lang-2.6.jar:/usr/share/dse/cassandra/lib/commons-lang3-3.1.jar:/usr/share
答案 0 :(得分:0)
我在使用过程中曾经遇到过同样的问题,现在已经修复了。我在这里建议几个选项:
检查您是否cassandra.yaml
能看到什么。
此端口号的设置存在于Ubuntu位置/etc/dse/cassandra/
的文件native_transport_port: 9042
中。有一行$(".qlBtns").on("click", function() {});
。尝试更改并重新启动c *。
分享您的更新,我们可能会尝试帮助。
答案 1 :(得分:0)
让机器运行1.5天后,机器似乎又回来了。我找不到原因,只是使用极端GC操作的行为,所以可能是一个gc死亡螺旋