读卡桑德拉失败

时间:2018-09-16 22:53:24

标签: cassandra cassandra-3.0

我有一个单节点cassandra设置。在表上运行select count(*) where查询时,cassandra cqlsh出现以下错误:

完整查询:

SELECT count(*) FROM casb.o365_activity_log_by_date WHERE
creation_time > '2018-09-16 00:00:00' and creation_time < '2018-09-16 23:59:59' 
ALLOW FILTERING;

响应消息:

ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] 
message="Operation failed - received 0 responses and 1 failures" 
info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

表架构:

CREATE TABLE IF NOT EXISTS casb.o365_activity_log_by_date (
    current_date date,
    creation_time timestamp,
    insertion_time timestamp,
    id text,
    client_ip text,
    workload text,
    operation text,
    user_id text,
    object_id text,
    activity_detail text,
    PRIMARY KEY ((current_date), insertion_time, id)
    )
) WITH CLUSTERING ORDER BY (insertion_time DESC, id DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

我还有另一个基于python的应用程序正在从该表中读取数据,工作似乎卡住了。

日志:

/var/log/cassandra/system.log

WARN  [ReadStage-2] 2018-09-16 22:06:48,803 ReadCommand.java:533 - Read 58545 live rows and 100001 tombstone cells for query SELECT * FROM casb.o365_activity_log_by_date WHERE creation_time > 2018-09-16 00:00Z AND creation_time < 2018-09-16 23:59Z LIMIT 100 (see tombstone_warn_threshold)
ERROR [ReadStage-2] 2018-09-16 22:06:48,804 StorageProxy.java:1906 - Scanned over 100001 tombstones during query 'SELECT * FROM casb.o365_activity_log_by_date WHERE creation_time > 2018-09-16 00:00Z AND creation_time < 2018-09-16 23:59Z LIMIT 100' (last scanned row partion key was ((2018-09-15), 2018-09-15 08:09Z, 72160ee4-5310-4941-af92-d27ced9c9ca8)); query aborted
WARN  [Native-Transport-Requests-1] 2018-09-16 22:07:02,937 SelectStatement.java:430 - Aggregation query used without partition key
WARN  [Native-Transport-Requests-1] 2018-09-16 22:07:45,946 SelectStatement.java:430 - Aggregation query used without partition key
WARN  [ReadStage-2] 2018-09-16 22:07:47,200 ReadCommand.java:533 - Read 58545 live rows and 100001 tombstone cells for query SELECT * FROM casb.o365_activity_log_by_date WHERE creation_time > 2018-09-16 00:00Z AND creation_time < 2018-09-16 23:59Z LIMIT 100 (see tombstone_warn_threshold)
ERROR [ReadStage-2] 2018-09-16 22:07:47,200 StorageProxy.java:1906 - Scanned over 100001 tombstones during query 'SELECT * FROM casb.o365_activity_log_by_date WHERE creation_time > 2018-09-16 00:00Z AND creation_time < 2018-09-16 23:59Z LIMIT 100' (last scanned row partion key was ((2018-09-15), 2018-09-15 08:09Z, 72160ee4-5310-4941-af92-d27ced9c9ca8)); query aborted
WARN  [Native-Transport-Requests-1] 2018-09-16 22:17:52,810 SelectStatement.java:430 - Aggregation query used without partition key
WARN  [ReadStage-2] 2018-09-16 22:17:54,513 ReadCommand.java:533 - Read 58545 live rows and 100001 tombstone cells for query SELECT * FROM casb.o365_activity_log_by_date WHERE creation_time > 2018-09-17 00:00Z AND creation_time < 2018-09-17 23:59Z LIMIT 100 (see tombstone_warn_threshold)
ERROR [ReadStage-2] 2018-09-16 22:17:54,513 StorageProxy.java:1906 - Scanned over 100001 tombstones during query 'SELECT * FROM casb.o365_activity_log_by_date WHERE creation_time > 2018-09-17 00:00Z AND creation_time < 2018-09-17 23:59Z LIMIT 100' (last scanned row partion key was ((2018-09-15), 2018-09-15 08:09Z, 72160ee4-5310-4941-af92-d27ced9c9ca8)); query aborted
WARN  [Native-Transport-Requests-3] 2018-09-16 22:18:09,541 SelectStatement.java:430 - Aggregation query used without partition key
INFO  [ScheduledTasks:1] 2018-09-16 22:18:17,143 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
WARN  [Native-Transport-Requests-1] 2018-09-16 22:18:28,160 SelectStatement.java:430 - Aggregation query used without partition key
WARN  [Native-Transport-Requests-1] 2018-09-16 22:18:47,943 SelectStatement.java:430 - Aggregation query used without partition key
INFO  [CompactionExecutor:75] 2018-09-16 22:28:26,738 AutoSavingCache.java:394 - Saved KeyCache (48 items) in 250 ms
INFO  [IndexSummaryManager:1] 2018-09-16 22:29:27,992 IndexSummaryRedistribution.java:75 - Redistributing index summaries

更多详细信息:

我可以在同一张表上运行以下查询,而不会出现任何错误:

SELECT * FROM casb.o365_activity_log_by_date;

在上面的查询中运行,我可以看到其中有些列的值大多为空。从日志中看到这一点,我猜测它与卡桑德拉的墓碑有关。

我应该在这里做什么?我调查了this answer,因此应该清理墓碑吗?我不确定。

1 个答案:

答案 0 :(得分:1)

Your query uses two maybe anti-patterns for Cassandra.

First, you are trying to count all keys in your entire database. This will result in reading all your entire data on disk in Cassandra as there are no indices as in an RDBMS which could answer your question quickly. Be alarmed that SELECT * FROM foo; or SELECT count(*) FROM bar; will always be slow in Cassandra.

Second, you ignored the warning from Cassandra to do those things - ALLOW FILTERING must be written explicitly. Keep in mind that this is meant for protecting you to do entire cluster reads. Your select statement is filtering on creation_time which is not part of your primary key.

So I would bet that you are hitting timeouts while your query runs. Have a look in your system.log from Cassandra, often found under /var/log/cassandra/system.log when installed from packages.

In general - if you want to use Cassandra I recommend going through some data modelling courses which are available from DataStax for example. It usually boils down to this: build your data model around the queries you will be using - denormalize if necessary, so that your queries ideally only hit one partition key.