Question

我需要存储简单的数据 - 假设我有一些产品将代码作为主键，一些属性和有效范围。所以数据看起来像这样：

Products
code    value   begin_date  end_date
10905   13      2005-01-01  2016-12-31
10905   11      2017-01-01  null

这些范围不重叠，所以在每个日期我都有一系列独特的产品及其属性。因此，为了简化它的使用，我创建了这个功能：

create function dbo.f_Products
(
    @date date
)
returns table
as
return (
    select
    from dbo.Products as p
    where
        @date >= p.begin_date and
        @date <= p.end_date
)

这就是我将要使用它的方式：

select
    *
from <some table with product codes> as t
    left join dbo.f_Products(@date) as p on
        p.code = t.product_code

这一切都很好，但我怎么能让优化器知道这些行是否具有更好的执行计划？

我做了一些谷歌搜索，发现了一些非常好的DDL文章，可以防止在表格中存储重叠范围：

但即使我尝试这些约束，我也看到优化器无法理解结果记录集将返回唯一代码。

我想要的是某种方法，它给我的性能基本上与我在某个日期存储这些产品列表并使用date = @date选择它一样。

我知道有些RDMBS（比如PostgreSQL）有特殊的数据类型（Range Types）。但是SQL Server没有这样的东西。

我是否遗漏了某些内容或者在SQL Server中无法正确执行此操作？

Answer 1

您可以创建一个indexed view，其中包含该范围内每个code/date的行。

ProductDate (indexed view)
code    value   date
10905   13      2005-01-01
10905   13      2005-01-02
10905   13      ...
10905   13      2016-12-31
10905   11      2017-01-01
10905   11      2017-01-02
10905   11      ...
10905   11      Today

像这样：

create schema digits
go

create table digits.Ones (digit tinyint not null primary key)
insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.Tens (digit tinyint not null primary key)
insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.Hundreds (digit tinyint not null primary key)
insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.Thousands (digit tinyint not null primary key)
insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.TenThousands (digit tinyint not null primary key)
insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
go

create schema info
go

create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date))
insert into info.Products (code, [value], begin_date, end_date) values 
(10905, 13, '2005-01-01', '2016-12-31'),
(10905, 11, '2017-01-01', null)

create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1))
insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate())
go

create view info.ProductDate with schemabinding 
as
select
    p.code,
    p.value,
    dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date]
from
    info.DateRange as dr
cross join
    digits.Ones as ones
cross join
    digits.Tens as tens
cross join
    digits.Hundreds as huns
cross join
    digits.Thousands as thos
cross join
    digits.TenThousands as tthos
join
    info.Products as p on
    dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31))
go

create unique clustered index idx_ProductDate on info.ProductDate ([date], code)
go

select *
from info.ProductDate with (noexpand)
where 
    date = '2014-01-01'

drop view info.ProductDate
drop table info.Products
drop table info.DateRange
drop table digits.Ones
drop table digits.Tens
drop table digits.Hundreds
drop table digits.Thousands
drop table digits.TenThousands
drop schema digits
drop schema info
go

Answer 2

没有差距的解决方案可能是这样的：

DECLARE @tbl TABLE(ID INT IDENTITY,[start_date] DATE);
INSERT INTO @tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'});

SELECT * FROM @tbl;

DECLARE @DateFilter DATE={d'2016-08-13'};

SELECT TOP 1 * 
FROM @tbl
WHERE [start_date]<=@DateFilter
ORDER BY [start_date] DESC

重要提示：确保start_date

上有（唯一）索引

更新：针对不同的产品

DECLARE @tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE);
INSERT INTO @tbl VALUES
--product 1
(1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'})
--product 1
,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'});

DECLARE @DateFilter DATE={d'2016-08-13'};

WITH PartitionedCount AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr
          ,*
    FROM @tbl
    WHERE [start_date]<=@DateFilter
)
SELECT *
FROM PartitionedCount
WHERE Nr=1

Answer 3

首先，您需要为（begin_date，end_date，code）创建唯一的聚簇索引

然后SQL引擎将能够执行INDEX SEEK。

此外，您还可以尝试为dbo.Products表创建一个视图，以使用预先填充的dbo.Dates表连接该表。

ArrayList<String> myArrayList = new ArrayList()

然后在您的函数中，您将该视图用作“where @date = view.date”。结果可能更好或稍差......这取决于实际数据。

您还可以尝试将该视图编入索引（取决于更新的频率）。

或者，如果为[begin_date] .. [end_date]范围中的每个日期填充dbo.Products表，则可以获得更好的性能。

Answer 4

使用ROW_NUMBER方法扫描整个Products表一次。如果您在Products表中有很多产品代码且每个代码的有效范围很少，那么这是最好的方法。

WITH
CTE_rn
AS
(
    SELECT
        code
        ,value
        ,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn
    FROM Products
    WHERE begin_date <= @date
)
SELECT *
FROM
    <some table with product codes> as t
    LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1
;

如果Products表中的每个代码的产品代码和有效范围很少，那么最好使用Products为每个代码查找OUTER APPLY表。

SELECT *
FROM
    <some table with product codes> as t
    OUTER APPLY
    (
        SELECT TOP(1)
            Products.value
        FROM Products
        WHERE
            Products.code = t.product_code
            AND Products.begin_date <= @date
        ORDER BY Products.begin_date DESC
    ) AS A
;

两种变体都需要(code, begin_date DESC) include (value)上的唯一索引。

请注意查询如何查看end_date，因为它们假设间隔没有间隙。它们将在SQL Server 2008中运行。

Answer 5

编辑：我原来的答案是使用INNER JOIN，但提问者想要一个LEFT JOIN。

CREATE TABLE Products
  (
  [Code] INT NOT NULL
  , [Value] VARCHAR(30) NOT NULL
  , Begin_Date DATETIME NOT NULL
  , End_Date DATETIME NULL
  )

/*
Products
code    value   begin_date  end_date
10905   13      2005-01-01  2016-12-31
10905   11      2017-01-01  null
*/
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31')
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL)

CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value])

CREATE TABLE SomeTableWithProductCodes
  (
  [CODE] INT NOT NULL 
  )

 INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)

这是一个原型查询，带有日期谓词。请注意，有更多最佳方式以防弹方式执行此操作，在上限使用“小于”运算符，但这是一个不同的讨论。

SELECT
  P.[Code]
  , P.[Value]
  , P.[Begin_Date]
  , P.[End_Date]
FROM
   SomeTableWithProductCodes ST
   LEFT JOIN Products AS P ON
     ST.[Code] = P.[Code]
     AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')

此查询将在Product表上执行Index Seek。

这是一个SQL小提琴：SQL Fiddle - Products and Dates

存储日期范围的有效方式

5 个答案:

更新：针对不同的产品