Azure SQL数据仓库表

时间:2017-04-19 18:16:35

标签: sql-server azure-sqldw

我试图在Azure SQL数据仓库中使用分区表。但我看到的东西对我来说没有意义。我显然做错了什么,但我无法弄清楚它是什么。

我的意图是用10000行数据填充第一个表(Marc.foo),检查分区元数据,然后将分区切换到第二个空表(Marc.foo2)。

我首先创建了两个分区表:

IF OBJECT_ID('Marc.foo', 'U') IS NOT NULL
  DROP TABLE Marc.foo
GO

IF OBJECT_ID('Marc.foo2', 'U') IS NOT NULL
  DROP TABLE Marc.foo2
GO

CREATE TABLE Marc.foo
(
    id int NOT NULL
)
WITH 
(   
     DISTRIBUTION = HASH (id),
     CLUSTERED COLUMNSTORE INDEX, 
     PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000))
)
GO

CREATE TABLE Marc.foo2
(
    id int NOT NULL
)
WITH 
(   
     DISTRIBUTION = HASH (id),
     CLUSTERED COLUMNSTORE INDEX, 
     PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000))
)
GO

然后我用10000行填充第一个表(Marc.foo):

IF OBJECT_ID('tempdb..#numbers', 'U') IS NOT NULL
  DROP TABLE #numbers
GO

WITH 
    CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id), 
    CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b), 
    CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b), 
    CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b), 
    CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b)
SELECT      id
INTO        #numbers
FROM        CTE_64K

INSERT INTO Marc.foo(id)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM #numbers WHERE id <=10000

因为我刚刚将数据加载到表格中,所以我将在桌面上创建统计数据:

CREATE STATISTICS stats_Marc_foo_id ON Marc.foo(id) WITH FULLSCAN

现在我检查分区元数据:

SELECT      sch.name AS [schema_name],
            tbl.[name] AS [table_name],
            ds.type_desc, 
            prt.[partition_number],
            rng.[value] AS [current_partition_range_boundary_value],
            prt.[rows] AS [partition_rows]
FROM        sys.schemas                             sch
            INNER JOIN sys.tables                   tbl ON  sch.schema_id       = tbl.schema_id
            INNER JOIN sys.partitions               prt ON  prt.[object_id]     = tbl.[object_id]
            INNER JOIN sys.indexes                  idx ON  prt.[object_id]     = idx.[object_id] AND prt.[index_id] = idx.[index_id]
            INNER JOIN sys.data_spaces              ds  ON  idx.[data_space_id] = ds.[data_space_id]
            INNER JOIN sys.partition_schemes        ps  ON  ds.[data_space_id]  = ps.[data_space_id]
            INNER JOIN sys.partition_functions      pf  ON  ps.[function_id]    = pf.[function_id]
            LEFT JOIN sys.partition_range_values    rng ON  pf.[function_id]    = rng.[function_id] AND rng.[boundary_id] = prt.[partition_number]
WHERE       sch.name = 'Marc' AND
            tbl.name = 'foo'

问题1:这给了我对current_partition_range_boundary_value的期望,但是partition_rows(我希望是1000)为每个分区返回5957行。

最后,我尝试将Marc.foo的分区1切换到Marc.foo2

ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1

我希望当我从Marc.foo2中选择时,我应该看到1000行,其id值从1到1000.但我得到的是零行。

问题2:我做错了什么?

2 个答案:

答案 0 :(得分:3)

您的代码中存在错误。您的CTE会返回所有行的数字1,您可以通过检查#numbers表的内容来确认。因此,id <= 10000的条件无效,语句总是带回65,536行:

1 1 1 1 1

ROW_NUMBER向上移动到SELECT ... INTO,例如

,以纠正此问题
WITH 
    CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id), 
    CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b), 
    CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b), 
    CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b), 
    CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b)
SELECT      ROW_NUMBER() OVER (ORDER BY (SELECT NULL))  AS id
INTO        #numbers
FROM        CTE_64K

我想这个故事的寓意是,不要在没有检查的情况下编写自己的数字生成例程:)

答案 1 :(得分:3)

除了数字表,这是问题

  

问题1:这给了我对current_partition_range_boundary_value的期望,但是partition_rows(我期望为1000)为每个分区返回5957行。

我仍然无法得到我对此的期望。

最后,我尝试将分区1从Marc.foo切换到Marc.foo2

ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1

我希望当我从Marc.foo2中选择时,我应该看到1000行,其id值从1到1000.但我得到的是零行。

  

问题2:我做错了什么?

我误解了RANGE RIGHT。如果我们查看CREATE TABLE的partition子句,我们会看到:

PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 
6000, 7000, 8000, 9000)))

这意味着ID最大但不包括零的行将位于分区1中,ID介于0和999之间的行将位于分区2中。

分区1中没有行。这是按设计工作的。如果我切换了分区2,则行显示在Marc.foo2