高流量后Cassandra WriteTimeoutException

时间:2015-04-16 07:03:57

标签: python-2.7 uwsgi cassandra-2.0

我正在尝试使用cassandra来存储跟踪数据。我通过django和nginx上的uwsgi处理获取请求(300 req / sec)和跟踪信息。 django应用程序直接写入cassandra(带有3个RF的单个集群)。为此我采用了cassandra-driver(2.5.0)。

重新启动所有相关服务后,整个工作大约需要两个小时。然后服务器关闭并产生服务器错误500.

目前我还不知道瓶颈在哪里。 Syslog在我的系统上显示正常的I / O活动,CPU也只使用了32%。

我主要在柜台工作。我的表看起来像这样:

CREATE TABLE rollup_hour_counter (
  name text,
  player_id text,
  hour timestamp,
  "count" counter,
  PRIMARY KEY ((name, player_id), hour)
)

我写入cassandra的代码如下:

rows = session.execute('UPDATE rollup_hour_counter '
                           'SET count = count + 1'
                           'WHERE player_id=\'%s\' AND hour=\'%s:00:00\' AND name = \'name\'' % (player_uuid, date_now_hour)) 

我对日志文件中的错误进行了一些研究。

uwsgi日志显示:

  

Thu Apr 16 07:44:04 2015 -   uwsgi_response_writev_headers_and_body_do():管道破裂   GET期间[core / writer.c第296行]   / t / i / 7bcdd5e185fd4608b0d3c3451f5ec56a /(146.58.126.229)

cassandra日志显示:

  

警告[SharedPool-Worker-12] 2015-04-16 07:44:50,424 AbstractTracingAwareExecutorService.java:169 - 线程上未捕获的异常线程[SharedPool-Worker-12,5,main]:{}   java.lang.RuntimeException:org.apache.cassandra.exceptions.WriteTimeoutException:操作超时 - 仅收到0个响应。           在org.apache.cassandra.service.StorageProxy $ DroppableRunnable.run(StorageProxy.java:2174)〜[apache-cassandra-2.1.4.jar:2.1.4]           at java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:471)〜[na:1.7.0_76]           在org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService $ FutureTask.run(AbstractTracingAwareExecutorService.java:164)〜[apache-cassandra-2.1.4.jar:2.1.4]           在org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)[apache-cassandra-2.1.4.jar:2.1.4]           在java.lang.Thread.run(Thread.java:745)[na:1.7.0_76]   引起:org.apache.cassandra.exceptions.WriteTimeoutException:操作超时 - 仅收到0个响应。           在org.apache.cassandra.db.CounterMutation.grabCounterLocks(CounterMutation.java:146)〜[apache-cassandra-2.1.4.jar:2.1.4]           在org.apache.cassandra.db.CounterMutation.apply(CounterMutation.java:122)〜[apache-cassandra-2.1.4.jar:2.1.4]    在org.apache.cassandra.db.CounterMutation.apply(CounterMutation.java:122)〜[apache-cassandra-2.1.4.jar:2.1.4]           在org.apache.cassandra.service.StorageProxy $ 8.runMayThrow(StorageProxy.java:1147)〜[apache-cassandra-2.1.4.jar:2.1.4]           在org.apache.cassandra.service.StorageProxy $ DroppableRunnable.run(StorageProxy.java:2171)〜[apache-cassandra-2.1.4.jar:2.1.4]           ...省略了4个常见帧

nginx日志显示:

  

2015/04/16 07:43:57 [错误] 2111#0:* 268738上游超时(110:   从上游读取响应头时,连接超时)   客户端:82.115.124.102,服务器:server.someservice.com,请求:" GET   [...] HTTP / 1.1",上游:   " uwsgi:// unix:///tmp/track.sock" ;,主持人:" server2.someotherservice.com",   推荐人:" [..]"

首先,我认为uwsgi进程和套接字是负责任的。我添加了一些工作进程增加缓冲区大小但没有帮助。

其次我调整了一些cassandra设置,但没有帮助:

我的cassandra设置如下:

num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
    - /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:

    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          # seeds is actually a comma-delimited list of addresses.
          # Ex: "<ip1>,<ip2>,<ip3>"
          - seeds: "127.0.0.1"
concurrent_reads: 96
concurrent_writes: 96
concurrent_counter_writes: 96
file_cache_size_in_mb: 1512
memtable_allocation_type: heap_buffers
memtable_flush_writers: 96
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: localhost
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
rpc_min_threads: 60
rpc_max_threads: 96
thrift_framed_transport_size_in_mb: 15
memtable_flush_after_mins: 60
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100 
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra
    # More advanced defaults below:
    # protocol: TLS
    # algorithm: SunX509
    # store_type: JKS
    # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
    # require_client_auth: false
client_encryption_options:
    enabled: false
    keystore: conf/.keystore
    keystore_password: cassandra
    # require_client_auth: false
    # Set trustore and truststore_password if require_client_auth is true
    # truststore: conf/.truststore
    # truststore_password: cassandra
    # More advanced defaults below:
    # protocol: TLS
    # algorithm: SunX509
    # store_type: JKS
    # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
internode_compression: all
inter_dc_tcp_nodelay: false 

有人知道瓶颈在哪里吗?

0 个答案:

没有答案