优化Cassandra查询性能

时间:2018-08-14 23:21:00

标签: database performance cassandra query-optimization database-schema

我正在使用Cassandra存储100M数据条目,并且正在尝试优化读写查询。当前,该架构如下所示:

DROP KEYSPACE IF EXISTS reviews_db;

CREATE KEYSPACE reviews_db WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};

USE reviews_db;

CREATE TABLE reviews(
id INT,
houseId INT, 
name TEXT,
picture TEXT,
reviewText TEXT,
reviewDate TEXT,
accuracyRating INT,
locationRating INT,
communicationRating INT,
checkinRating INT,
cleanlinessRating INT,
valueRating INT,
overallRating DECIMAL,
PRIMARY KEY(id, houseId)
);

CREATE INDEX ON reviews (houseId);

COPY reviews (id, houseId, name, picture, reviewText, reviewDate, accuracyRating, locationRating, communicationRating, checkinRating, cleanlinessRating, valueRating, overallRating) FROM './database/data/reviews1.csv' WITH DELIMITER=',' AND HEADER=FALSE;

当我运行查询select id,houseid from reviews where houseid = 9999954;

跟踪看起来像这样:

Tracing session: 36fc1b20-a011-11e8-ac04-9109b2e8334a

activity                                                                                                                               | timestamp                  | source    | source_elapsed | client
---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                Execute CQL3 query | 2018-08-14 15:27:23.218000 | 127.0.0.1 |              0 | 127.0.0.1
                                     Parsing select id,houseid from reviews where houseid = 9999954; [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |            253 | 127.0.0.1
                                                                                 Preparing statement [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |            448 | 127.0.0.1
              Index mean cardinalities are reviews_houseid_idx:1. Scanning with reviews_houseid_idx. [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |            968 | 127.0.0.1
                                                                           Computing ranges to query [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |           1073 | 127.0.0.1       
Submitting range requests on 257 ranges with a concurrency of 257 (0.003515625 rows per range expected) [Native-Transport-Requests-1] | 2018-08-14 15:27:23.220000 | 127.0.0.1 |           1668 | 127.0.0.1                                       
                                                               Submitted 1 concurrent range requests [Native-Transport-Requests-1] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2260 | 127.0.0.1
                                                Executing read on reviews_db.reviews using index reviews_houseid_idx [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2341 | 127.0.0.1
                                                     Executing single-partition query on reviews.reviews_houseid_idx [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2400 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2445 | 127.0.0.1
                                           Skipped 0/5 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2546 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1029 [ReadStage-2] | 2018-08-14 15:27:23.227000 | 127.0.0.1 |           8775 | 127.0.0.1
                                                                            Bloom filter allows skipping sstable 819 [ReadStage-2] | 2018-08-14 15:27:23.228000 | 127.0.0.1 |           9481 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1176 [ReadStage-2] | 2018-08-14 15:27:23.229000 | 127.0.0.1 |          10102 | 127.0.0.1
                                                                Partition index with 0 entries found for sstable 517 [ReadStage-2] | 2018-08-14 15:27:23.234000 | 127.0.0.1 |          15699 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1259 [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22535 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22724 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22751 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22779 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.251000 | 127.0.0.1 |          32604 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.258000 | 127.0.0.1 |          39903 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.258000 | 127.0.0.1 |          39959 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.258000 | 127.0.0.1 |          39987 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.260000 | 127.0.0.1 |          41753 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.269000 | 127.0.0.1 |          50605 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.275000 | 127.0.0.1 |          57061 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.276000 | 127.0.0.1 |          57325 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.276000 | 127.0.0.1 |          57412 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.276000 | 127.0.0.1 |          57462 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.278000 | 127.0.0.1 |          59387 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.287000 | 127.0.0.1 |          68588 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.294000 | 127.0.0.1 |          75900 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.295000 | 127.0.0.1 |          76188 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.295000 | 127.0.0.1 |          76267 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.295000 | 127.0.0.1 |          76321 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.302000 | 127.0.0.1 |          83846 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.313000 | 127.0.0.1 |          94648 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.322000 | 127.0.0.1 |         103400 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.322000 | 127.0.0.1 |         103745 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.322000 | 127.0.0.1 |         103833 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.322001 | 127.0.0.1 |         103901 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.336000 | 127.0.0.1 |         117832 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.344000 | 127.0.0.1 |         125175 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.344000 | 127.0.0.1 |         125275 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.344000 | 127.0.0.1 |         125346 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.347000 | 127.0.0.1 |         128201 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.358000 | 127.0.0.1 |         139767 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.367000 | 127.0.0.1 |         148635 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.368000 | 127.0.0.1 |         149174 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.368000 | 127.0.0.1 |         149290 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.368000 | 127.0.0.1 |         149357 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.371000 | 127.0.0.1 |         152815 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.379000 | 127.0.0.1 |         160651 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169148 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169607 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169690 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169759 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.389000 | 127.0.0.1 |         170955 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.399000 | 127.0.0.1 |         180652 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.406000 | 127.0.0.1 |         188039 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.407000 | 127.0.0.1 |         188130 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.407000 | 127.0.0.1 |         188180 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.412000 | 127.0.0.1 |         193070 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.420000 | 127.0.0.1 |         201613 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.427000 | 127.0.0.1 |         208842 | 127.0.0.1
                                                                              Read 9 live rows and 0 tombstone cells [ReadStage-2] | 2018-08-14 15:27:23.427000 | 127.0.0.1 |         209064 | 127.0.0.1
                                                                           Merged data from memtables and 3 sstables [ReadStage-2] | 2018-08-14 15:27:23.428000 | 127.0.0.1 |         209165 | 127.0.0.1
                                                                                                                  Request complete | 2018-08-14 15:27:23.427622 | 127.0.0.1 |         209622 | 127.0.0.1

查询需要209毫秒,我想将其缩短到少于50毫秒。有什么办法可以让我度过这样的时光?

2 个答案:

答案 0 :(得分:1)

好的。创建围绕houseid设计的查询表:

CREATE TABLE reviews_by_house_id(
  id INT,
  houseId INT, 
  name TEXT,
  picture TEXT,
  reviewText TEXT,
  reviewDate TEXT,
  accuracyRating INT,
  locationRating INT,
  communicationRating INT,
  checkinRating INT,
  cleanlinessRating INT,
  valueRating INT,
  overallRating DECIMAL,
  PRIMARY KEY(houseId,id));

二级索引查询(即使在单个节点实例上)也永远无法达到该性能水平。如果您确实需要原始表,则将它们与BATCHed写入保持同步。我敢打赌,houseId在此表上进行的查询符合您的性能要求。

答案 1 :(得分:0)

您无法对非分区键(例如houseId)执行有效的查询,因为它将需要扫描所有现有分区并从中提取数据以匹配您的字段。如果您对houseId也有条件,那么您也可以对id有条件。

在Cassandra中,您围绕需要执行的查询创建数据模型,因此具有以下可能性:

  • 使用houseId作为分区键创建辅助表,然后自己填写(可能数据较少);
  • 使用实例化视图(尽管它们仍被认为是实验性功能);
  • 使用二级索引,但是应该检查它,因为它们只能在特定情况下使用。您可以read more about them in this blog post

如果您有机会使用DataStax企业版,则还有另一种可能性-DSE搜索。只需在表上创建一个搜索索引,DSE Search下方的Solr即可完成查询(尽管延迟会比“普通Cassandra”的延迟高)。