JVM状态确定为不稳定

时间:2018-01-16 13:32:54

标签: cassandra san

抱歉,我的非结构化帖子。我这是第一次在这里做,我不是开发人员。我们将不胜感激任何帮助!先感谢您。

我们正在为购买我们使用Cassandra作为数据库的产品的客户提供客户支持。客户有一个Cassandra节点并且正在使用SAN设备。我们知道这可能是一种不好的做法。我知道以下文章:https://www.datastax.com/dev/blog/impact-of-shared-storage-on-apache-cassandra
客户的存储(Cassandra数据库)每2-10小时崩溃,但以下情况除外:

  ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2017-12-10 19:54:16,27... 
  ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2017-12-10 19:54:16,279 JVMStabilityInspector.java:118 - JVM state determined to be unstable.  Exiting forcefully due to:
                org.apache.cassandra.io.FSWriteError: java.io.IOException: The semaphore timeout period has expired 
                        at org.apache.cassandra.db.commitlog.MemoryMappedSegment.write(MemoryMappedSegment.java:100) ~[apache-cassandra-2.2.8.jar:2.2.8]
                        at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(CommitLogSegment.java:296) ~[apache-cassandra-2.2.8.jar:2.2.8]
                        at org.apache.cassandra.db.commitlog.CommitLog.sync(CommitLog.java:230) ~[apache-cassandra-2.2.8.jar:2.2.8] 
                        at org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:93) ~[apache-cassandra-2.2.8.jar:2.2.8]
                        at java.lang.Thread.run(Unknown Source) [na:1.8.0_151] 
                Caused by: java.io.IOException: The semaphore timeout period has expired 
                        at java.nio.MappedByteBuffer.force0(Native Method) ~[na:1.8.0_151] 
                        at java.nio.MappedByteBuffer.force(Unknown Source) ~[na:1.8.0_151] 
                        at org.apache.cassandra.utils.SyncUtil.force(SyncUtil.java:113) ~[apache-cassandra-2.2.8.jar:2.2.8] 
                        at org.apache.cassandra.db.commitlog.MemoryMappedSegment.write(MemoryMappedSegment.java:96) ~[apache-cassandra-2.2.8.jar:2.2.8]
                        ... 4 common frames omitted   

我的问题是:

  • 是否有可能以牺牲性能为代价使Cassandra工作?该                    客户购买了SAN设备用于我们的产品。它们是均匀的                    愿意将我们的产品从现有的RAID 5 LUN迁移到新的                    RAID 10 LUN,但我不确定它是否可行。
  • 是否值得尝试调整一些配置                    Cassandra的参数,看看数据库是否停止崩溃?如果                    是的,那么什么配置参数会影响这个问题?

在我们监控了性能数据并查看了异常后,我们决定通过减慢速度来使Cassandra更加稳定。我们更改了影响并发读写的参数。我们认为当Cassandra数据库足够稳定时,我们可以开始增加一些值。具体来说,我们在Casssandra.yaml文件中更改了以下属性。

                  commitlog_sync_period_in_ms: 3600000
                   concurrent_reads: 4
                   concurrent_writes: 4
                   concurrent_counter_writes: 4

Cassandra在1.5小时后坠毁。

                 **Cassandra.yaml:**

    batchlog_replay_throttle_in_kb: 1024
    role_manager: CassandraRoleManager
    roles_validity_in_ms: 2000
    disk_failure_policy: die
    disk_access_mode: standard 
    commit_failure_policy: die
    key_cache_save_period: 14400
    row_cache_size_in_mb: 0
    row_cache_save_period: 0
    counter_cache_size_in_mb:0
    counter_cache_save_period: 7200
    commitlog_sync: periodic
    commitlog_sync_period_in_ms: 3600000
    commitlog_segment_size_in_mb: 128
    concurrent_reads: 4
    concurrent_writes: 4
    concurrent_counter_writes: 4
    file_cache_size_in_mb: 128
    memtable_heap_space_in_mb: 128
    memtable_offheap_space_in_mb: 128
    memtable_allocation_type: heap_buffers
    commitlog_total_space_in_mb: 1024
    index_summary_resize_interval_in_minutes: 60
    trickle_fsync: false
    trickle_fsync_interval_in_kb: 10240
    storage_port: 7100
    thrift_framed_transport_size_in_mb: 160
    incremental_backups: false
    column_index_size_in_kb: 64
    batch_size_warn_threshold_in_kb: 5
    batch_size_fail_threshold_in_kb: 50
    unlogged_batch_across_partitions_warn_threshold: 10    
    server_encryption_options:
        internode_encryption: none
        keystore: conf/.keystore
        keystore_password: cassandra
        truststore: conf/.truststore
        truststore_password: cassandra
        client_encryption_options:
        enabled: true
        optional: false
        require_client_auth: true


 Customer environment:
    ReleaseVersion: 2.2.8
    Windows 2012 R2
    Java 1.8.0_151

Resource Monitor of disk where Cassandra storage is located
Perfmon data

0 个答案:

没有答案