Question

我开始学习CH，并且在尝试提高查询速度时似乎陷入僵局，表的创建是这样的

CREATE TABLE default.stats(
  aa String, 
  ab String, 
  user_id UInt16, 
  ac UInt32,  
  ad UInt8, 
  ae UInt8, 
  created_time DateTime, 
  created_date Date, 
  af UInt8, 
  ag UInt32, 
  ah UInt32, 
  ai String, 
  aj String) 
ENGINE = MergeTree 
PARTITION BY toYYYYMM(created_time) 
ORDER BY(created_time, user_id)

我正在像这样运行查询

SELECT ad, created_time, ab, aa, user_id, ac, ag, af 
FROM stats 
WHERE user_id = 1 AND lowerUTF8(ab) = 'xxxxxxxxx' AND ad != 12 
ORDER BY created_time DESC 
LIMIT 50 OFFSET 0

这是设置的结果 50行。耗时：2.881秒。处理了7462万行

，如果我运行不带订单部分的相同查询，则会设置 50行。耗时：0.020秒。处理了49.15万行

如果理论上查询仅需排序约10k（返回的所有行都没有限制）的行，为什么似乎要处理表中的所有行？我缺少什么和/或如何提高CH的速度？

Answer 1

尝试ORDER BY created_time DESC，user_id

optimize_read_in_order功能已在ClickHouse版本19.14.3.3，2019-09-10中实现

Answer 2

CH 19.17.4.11
CREATE TABLE stats
(
    `aa` String,
    `ab` String,
    `user_id` UInt16,
    `ac` UInt32,
    `ad` UInt8,
    `ae` UInt8,
    `created_time` DateTime,
    `created_date` Date,
    `af` UInt8,
    `ag` UInt32,
    `ah` UInt32,
    `ai` String,
    `aj` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_time)
ORDER BY (created_time, user_id)

insert into stats(created_time, user_id) select toDateTime(intDiv(number,100)), number%103 from numbers(100000000)


SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
ORDER BY created_time DESC
LIMIT 5 OFFSET 0

5 rows in set. Elapsed: 0.013 sec. Processed 835.84 thousand rows,


set  optimize_read_in_order = 0

SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
ORDER BY created_time DESC
LIMIT 5 OFFSET 0

5 rows in set. Elapsed: 0.263 sec. Processed 100.00 million rows

检查差异设置optimize_read_in_order = 0 VS设置optimize_read_in_order = 1

我不明白为什么您的情况下optimize_read_in_order不起作用。

ClickHouse MergeTree使用ORDER BY进行慢速选择

2 个答案: