我开始学习CH,并且在尝试提高查询速度时似乎陷入僵局,表的创建是这样的
CREATE TABLE default.stats(
aa String,
ab String,
user_id UInt16,
ac UInt32,
ad UInt8,
ae UInt8,
created_time DateTime,
created_date Date,
af UInt8,
ag UInt32,
ah UInt32,
ai String,
aj String)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_time)
ORDER BY(created_time, user_id)
我正在像这样运行查询
SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
WHERE user_id = 1 AND lowerUTF8(ab) = 'xxxxxxxxx' AND ad != 12
ORDER BY created_time DESC
LIMIT 50 OFFSET 0
这是设置的结果 50行。耗时:2.881秒。处理了7462万行
,如果我运行不带订单部分的相同查询,则会设置 50行。耗时:0.020秒。处理了49.15万行
如果理论上查询仅需排序约10k(返回的所有行都没有限制)的行,为什么似乎要处理表中的所有行?我缺少什么和/或如何提高CH的速度?
答案 0 :(得分:1)
尝试ORDER BY created_time DESC,user_id
optimize_read_in_order功能已在ClickHouse版本19.14.3.3,2019-09-10中实现
答案 1 :(得分:0)
CH 19.17.4.11
CREATE TABLE stats
(
`aa` String,
`ab` String,
`user_id` UInt16,
`ac` UInt32,
`ad` UInt8,
`ae` UInt8,
`created_time` DateTime,
`created_date` Date,
`af` UInt8,
`ag` UInt32,
`ah` UInt32,
`ai` String,
`aj` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_time)
ORDER BY (created_time, user_id)
insert into stats(created_time, user_id) select toDateTime(intDiv(number,100)), number%103 from numbers(100000000)
SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
ORDER BY created_time DESC
LIMIT 5 OFFSET 0
5 rows in set. Elapsed: 0.013 sec. Processed 835.84 thousand rows,
set optimize_read_in_order = 0
SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
ORDER BY created_time DESC
LIMIT 5 OFFSET 0
5 rows in set. Elapsed: 0.263 sec. Processed 100.00 million rows
检查差异 设置optimize_read_in_order = 0 VS设置optimize_read_in_order = 1
我不明白为什么您的情况下optimize_read_in_order不起作用。