我的目标是摄取在特定列上排序的数据,以便分区也按该顺序进行,以使该列上的修剪效率更高。
我想最大程度地减少排序成本,并希望获得有关我应该多久重新整理一次的一些指导。
例如:
CREATE TABLE test_order(n NUMBER, s STRING);
INSERT INTO test_order
VALUES
(12, 'a'),
(11, 'b'),
(10, 'c'),
(9, 'd'),
(8, 'e'),
(7, 'f'),
(6, 'g'),
(5, 'h'),
(6, 'i'),
(5, 'j'),
(4, 'k'),
(3, 'l'),
(2, 'm'),
(1, 'n');
SELECT * FROM test_order
ORDER BY n ASC;
ALTER TABLE test_order CLUSTER BY (n, s);
ALTER TABLE test_order RECLUSTER;
SELECT n, s FROM test_order;
SELECT SYSTEM$CLUSTERING_INFORMATION('test_order', '(n,s)');
这是第一次插入的信息:
{
"cluster_by_keys" : "LINEAR(N, S)",
"total_partition_count" : 1,
"total_constant_partition_count" : 0,
"average_overlaps" : 0.0,
"average_depth" : 1.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 1,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
这是第二次插入的信息:
INSERT INTO test_order
VALUES
(12, 'p'),
(11, 'f'),
(10, 'z'),
(9, 'y'),
(8, 'x'),
(7, 'w'),
(6, 'v'),
(5, 'u'),
(6, 't'),
(5, 's'),
(4, 'r'),
(3, 'q'),
(2, 'p'),
(1, 'o');
{
"cluster_by_keys" : "LINEAR(N, S)",
"total_partition_count" : 2,
"total_constant_partition_count" : 0,
"average_overlaps" : 1.0,
"average_depth" : 2.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 2,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
然后第二个重新出现:
{
"cluster_by_keys" : "LINEAR(N, S)",
"total_partition_count" : 2,
"total_constant_partition_count" : 0,
"average_overlaps" : 1.0,
"average_depth" : 2.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 2,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
对不起,我是格式化的新手,但是在插入特定顺序后,聚类比率没有太大变化-是因为我的数据集示例太小,还是顺序对于聚类性能无关紧要?
答案 0 :(得分:1)
如果您要提取排序的数据,我认为您不需要对表进行聚类。您的数据将自然聚类,并且将得到所需的修剪。