Question

我有大表ir_data（150GB），其中包含不同日期的数据（列val_date）。我需要知道我的应用程序中各个点的ir_data中是否有可用的日期。

select distinct(val_date) from ir_data

我的以下实验ir_data包含29个不同的val_date值。

设置1

我期望ir_data（val_date，key_id，other_colum）上的索引可以帮助快速找到29个值。事实上，这需要超过5分钟：

查询1的1，行数：29，经过的时间（秒） - 总计：343.96， SQL查询：343.958，阅读结果：0.002

我一直希望索引是树，其中节点存储在树结构中，例如，像这样

val_date -> key_id   -> other_column -> data-nodes

1.1.2017 -> 0-50     -> A            -> (1.1.2017, 0, Automobile), (1.1.2017, 2, Amsterdam)
                     -> B-E          -> (1.1.2017, 12, Batman)
         -> 51-100   -> A            -> ...
                        X
         -> 666-1000 -> A
                     -> B-C
                     -> E
2.1.2017 -> ...

基于这种结构，获得29个不同的val_dates应该非常快。

问题：为什么这么长时间???

子问题：有没有办法在不创建另一个表的情况下解决这个问题？

SETUP 2

我创建了另一个只包含val_date的索引。它需要大约相同的时间。

查询-安排：

    The type of query is SELECT.

2 operator(s) under root

   |ROOT:EMIT Operator (VA = 2)
   |
   |   |GROUP SORTED Operator (VA = 1)
   |   |Distinct
   |   |
   |   |   |SCAN Operator (VA = 0)
   |   |   |  FROM TABLE
   |   |   |  ir_data
   |   |   |  Index : ir_data_idx1 <-- this is the index containing only val_date.
   |   |   |  Forward Scan.
   |   |   |  Positioning at index start.
   |   |   |  Index contains all needed columns. Base table will not be read.
   |   |   |  Using I/O Size 16 Kbytes for index leaf pages.
   |   |   |  With MRU Buffer Replacement Strategy for index leaf pages.

Answer 1

您的表格和索引一样大。正如您在计划中所看到的，引擎执行索引扫描。此操作将很长，因为它会扫描您的整个索引以获取不同的值。

作为第一步，您可以在索引上尝试update index statistics，但我并不认为它会有所帮助。

如果是一次性手动操作，我猜你会对5分钟的操作感到满意。

如果它是您的应用程序执行的查询，那么您有两个我能想到的选择：

正如您在问题中所述 - 通过为日期创建附加表来规范化表并使用FK。
创建precomputed result set。这是一个物化视图 - 结果存储为常规表（而不是仅存储其定义的视图）。
它将自动刷新视图中的结果，并且将快速检索值 重要：与索引一样，会对Insert，Update ...的性能产生影响它看起来像：
```
create precomputed result set prs_ir_data
immediate refresh
as
select distinct val_date
from ir_data
```

您可以阅读有关预先计算结果集的here和here

Answer 2

recursive CTE大大加快了这个查询，即大表中的极少数不同的值。问题是目前没有实现在搜索不同值时搜索索引。这是link to the approach。

令人费解的索引扫描性能。为什么即使结果集很小并且已编制索引，扫描索引也会变慢

2 个答案: