Question

我的目标是摄取在特定列上排序的数据，以便分区也按该顺序进行，以使该列上的修剪效率更高。

我想最大程度地减少排序成本，并希望获得有关我应该多久重新整理一次的一些指导。

例如：

CREATE TABLE test_order(n NUMBER, s STRING);
INSERT INTO test_order 
VALUES 
   (12, 'a'), 
   (11, 'b'), 
   (10, 'c'), 
   (9, 'd'), 
   (8, 'e'), 
   (7, 'f'), 
   (6, 'g'), 
   (5, 'h'), 
   (6, 'i'), 
   (5, 'j'), 
   (4, 'k'), 
   (3, 'l'), 
   (2, 'm'), 
   (1, 'n');

SELECT * FROM test_order 
ORDER BY n ASC;

ALTER TABLE test_order CLUSTER BY (n, s);
ALTER TABLE test_order RECLUSTER;

SELECT n, s FROM test_order;
SELECT SYSTEM$CLUSTERING_INFORMATION('test_order', '(n,s)');

这是第一次插入的信息：

{
  "cluster_by_keys" : "LINEAR(N, S)",
  "total_partition_count" : 1,
  "total_constant_partition_count" : 0,
  "average_overlaps" : 0.0,
  "average_depth" : 1.0,
  "partition_depth_histogram" : {
    "00000" : 0,
    "00001" : 1,
    "00002" : 0,
    "00003" : 0,
    "00004" : 0,
    "00005" : 0,
    "00006" : 0,
    "00007" : 0,
    "00008" : 0,
    "00009" : 0,
    "00010" : 0,
    "00011" : 0,
    "00012" : 0,
    "00013" : 0,
    "00014" : 0,
    "00015" : 0,
    "00016" : 0
  }
}

这是第二次插入的信息：

INSERT INTO test_order 
VALUES 
   (12, 'p'), 
   (11, 'f'), 
   (10, 'z'), 
   (9, 'y'), 
   (8, 'x'), 
   (7, 'w'), 
   (6, 'v'), 
   (5, 'u'), 
   (6, 't'), 
   (5, 's'), 
   (4, 'r'), 
   (3, 'q'), 
   (2, 'p'), 
   (1, 'o');

{
  "cluster_by_keys" : "LINEAR(N, S)",
  "total_partition_count" : 2,
  "total_constant_partition_count" : 0,
  "average_overlaps" : 1.0,
  "average_depth" : 2.0,
  "partition_depth_histogram" : {
    "00000" : 0,
    "00001" : 0,
    "00002" : 2,
    "00003" : 0,
    "00004" : 0,
    "00005" : 0,
    "00006" : 0,
    "00007" : 0,
    "00008" : 0,
    "00009" : 0,
    "00010" : 0,
    "00011" : 0,
    "00012" : 0,
    "00013" : 0,
    "00014" : 0,
    "00015" : 0,
    "00016" : 0
  }
}

然后第二个重新出现：

{
  "cluster_by_keys" : "LINEAR(N, S)",
  "total_partition_count" : 2,
  "total_constant_partition_count" : 0,
  "average_overlaps" : 1.0,
  "average_depth" : 2.0,
  "partition_depth_histogram" : {
    "00000" : 0,
    "00001" : 0,
    "00002" : 2,
    "00003" : 0,
    "00004" : 0,
    "00005" : 0,
    "00006" : 0,
    "00007" : 0,
    "00008" : 0,
    "00009" : 0,
    "00010" : 0,
    "00011" : 0,
    "00012" : 0,
    "00013" : 0,
    "00014" : 0,
    "00015" : 0,
    "00016" : 0
  }
}

对不起，我是格式化的新手，但是在插入特定顺序后，聚类比率没有太大变化-是因为我的数据集示例太小，还是顺序对于聚类性能无关紧要？

Answer 1

如果您要提取排序的数据，我认为您不需要对表进行聚类。您的数据将自然聚类，并且将得到所需的修剪。

群集，在注入之前对数据进行排序会提高截断表的性能吗？

1 个答案: