我正在尝试使用cassandra来存储跟踪数据。我通过django和nginx上的uwsgi处理获取请求(300 req / sec)和跟踪信息。 django应用程序直接写入cassandra(带有3个RF的单个集群)。为此我采用了cassandra-driver(2.5.0)。
重新启动所有相关服务后,整个工作大约需要两个小时。然后服务器关闭并产生服务器错误500.
目前我还不知道瓶颈在哪里。 Syslog在我的系统上显示正常的I / O活动,CPU也只使用了32%。
我主要在柜台工作。我的表看起来像这样:
CREATE TABLE rollup_hour_counter (
name text,
player_id text,
hour timestamp,
"count" counter,
PRIMARY KEY ((name, player_id), hour)
)
我写入cassandra的代码如下:
rows = session.execute('UPDATE rollup_hour_counter '
'SET count = count + 1'
'WHERE player_id=\'%s\' AND hour=\'%s:00:00\' AND name = \'name\'' % (player_uuid, date_now_hour))
我对日志文件中的错误进行了一些研究。
uwsgi日志显示:
Thu Apr 16 07:44:04 2015 - uwsgi_response_writev_headers_and_body_do():管道破裂 GET期间[core / writer.c第296行] / t / i / 7bcdd5e185fd4608b0d3c3451f5ec56a /(146.58.126.229)
cassandra日志显示:
警告[SharedPool-Worker-12] 2015-04-16 07:44:50,424 AbstractTracingAwareExecutorService.java:169 - 线程上未捕获的异常线程[SharedPool-Worker-12,5,main]:{} java.lang.RuntimeException:org.apache.cassandra.exceptions.WriteTimeoutException:操作超时 - 仅收到0个响应。 在org.apache.cassandra.service.StorageProxy $ DroppableRunnable.run(StorageProxy.java:2174)〜[apache-cassandra-2.1.4.jar:2.1.4] at java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:471)〜[na:1.7.0_76] 在org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService $ FutureTask.run(AbstractTracingAwareExecutorService.java:164)〜[apache-cassandra-2.1.4.jar:2.1.4] 在org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)[apache-cassandra-2.1.4.jar:2.1.4] 在java.lang.Thread.run(Thread.java:745)[na:1.7.0_76] 引起:org.apache.cassandra.exceptions.WriteTimeoutException:操作超时 - 仅收到0个响应。 在org.apache.cassandra.db.CounterMutation.grabCounterLocks(CounterMutation.java:146)〜[apache-cassandra-2.1.4.jar:2.1.4] 在org.apache.cassandra.db.CounterMutation.apply(CounterMutation.java:122)〜[apache-cassandra-2.1.4.jar:2.1.4] 在org.apache.cassandra.db.CounterMutation.apply(CounterMutation.java:122)〜[apache-cassandra-2.1.4.jar:2.1.4] 在org.apache.cassandra.service.StorageProxy $ 8.runMayThrow(StorageProxy.java:1147)〜[apache-cassandra-2.1.4.jar:2.1.4] 在org.apache.cassandra.service.StorageProxy $ DroppableRunnable.run(StorageProxy.java:2171)〜[apache-cassandra-2.1.4.jar:2.1.4] ...省略了4个常见帧
nginx日志显示:
2015/04/16 07:43:57 [错误] 2111#0:* 268738上游超时(110: 从上游读取响应头时,连接超时) 客户端:82.115.124.102,服务器:server.someservice.com,请求:" GET [...] HTTP / 1.1",上游: " uwsgi:// unix:///tmp/track.sock" ;,主持人:" server2.someotherservice.com", 推荐人:" [..]"
首先,我认为uwsgi进程和套接字是负责任的。我添加了一些工作进程增加缓冲区大小但没有帮助。
其次我调整了一些cassandra设置,但没有帮助:
我的cassandra设置如下:
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "127.0.0.1"
concurrent_reads: 96
concurrent_writes: 96
concurrent_counter_writes: 96
file_cache_size_in_mb: 1512
memtable_allocation_type: heap_buffers
memtable_flush_writers: 96
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: localhost
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
rpc_min_threads: 60
rpc_max_threads: 96
thrift_framed_transport_size_in_mb: 15
memtable_flush_after_mins: 60
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# require_client_auth: false
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
internode_compression: all
inter_dc_tcp_nodelay: false
有人知道瓶颈在哪里吗?