我试图在Azure SQL数据仓库中使用分区表。但我看到的东西对我来说没有意义。我显然做错了什么,但我无法弄清楚它是什么。
我的意图是用10000行数据填充第一个表(Marc.foo),检查分区元数据,然后将分区切换到第二个空表(Marc.foo2)。
我首先创建了两个分区表:
IF OBJECT_ID('Marc.foo', 'U') IS NOT NULL
DROP TABLE Marc.foo
GO
IF OBJECT_ID('Marc.foo2', 'U') IS NOT NULL
DROP TABLE Marc.foo2
GO
CREATE TABLE Marc.foo
(
id int NOT NULL
)
WITH
(
DISTRIBUTION = HASH (id),
CLUSTERED COLUMNSTORE INDEX,
PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000))
)
GO
CREATE TABLE Marc.foo2
(
id int NOT NULL
)
WITH
(
DISTRIBUTION = HASH (id),
CLUSTERED COLUMNSTORE INDEX,
PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000))
)
GO
然后我用10000行填充第一个表(Marc.foo):
IF OBJECT_ID('tempdb..#numbers', 'U') IS NOT NULL
DROP TABLE #numbers
GO
WITH
CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id),
CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b),
CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b),
CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b),
CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b)
SELECT id
INTO #numbers
FROM CTE_64K
INSERT INTO Marc.foo(id)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM #numbers WHERE id <=10000
因为我刚刚将数据加载到表格中,所以我将在桌面上创建统计数据:
CREATE STATISTICS stats_Marc_foo_id ON Marc.foo(id) WITH FULLSCAN
现在我检查分区元数据:
SELECT sch.name AS [schema_name],
tbl.[name] AS [table_name],
ds.type_desc,
prt.[partition_number],
rng.[value] AS [current_partition_range_boundary_value],
prt.[rows] AS [partition_rows]
FROM sys.schemas sch
INNER JOIN sys.tables tbl ON sch.schema_id = tbl.schema_id
INNER JOIN sys.partitions prt ON prt.[object_id] = tbl.[object_id]
INNER JOIN sys.indexes idx ON prt.[object_id] = idx.[object_id] AND prt.[index_id] = idx.[index_id]
INNER JOIN sys.data_spaces ds ON idx.[data_space_id] = ds.[data_space_id]
INNER JOIN sys.partition_schemes ps ON ds.[data_space_id] = ps.[data_space_id]
INNER JOIN sys.partition_functions pf ON ps.[function_id] = pf.[function_id]
LEFT JOIN sys.partition_range_values rng ON pf.[function_id] = rng.[function_id] AND rng.[boundary_id] = prt.[partition_number]
WHERE sch.name = 'Marc' AND
tbl.name = 'foo'
问题1:这给了我对current_partition_range_boundary_value的期望,但是partition_rows(我希望是1000)为每个分区返回5957行。
最后,我尝试将Marc.foo的分区1切换到Marc.foo2
ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1
我希望当我从Marc.foo2中选择时,我应该看到1000行,其id值从1到1000.但我得到的是零行。
问题2:我做错了什么?
答案 0 :(得分:3)
您的代码中存在错误。您的CTE会返回所有行的数字1,您可以通过检查#numbers
表的内容来确认。因此,id <= 10000
的条件无效,语句总是带回65,536行:
将ROW_NUMBER
向上移动到SELECT ... INTO
,例如
WITH
CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id),
CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b),
CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b),
CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b),
CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
INTO #numbers
FROM CTE_64K
我想这个故事的寓意是,不要在没有检查的情况下编写自己的数字生成例程:)
答案 1 :(得分:3)
除了数字表,这是问题
问题1:这给了我对current_partition_range_boundary_value的期望,但是partition_rows(我期望为1000)为每个分区返回5957行。
我仍然无法得到我对此的期望。
最后,我尝试将分区1从Marc.foo
切换到Marc.foo2
。
ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1
我希望当我从Marc.foo2中选择时,我应该看到1000行,其id值从1到1000.但我得到的是零行。
问题2:我做错了什么?
我误解了RANGE RIGHT。如果我们查看CREATE TABLE的partition子句,我们会看到:
PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000,
6000, 7000, 8000, 9000)))
这意味着ID最大但不包括零的行将位于分区1中,ID介于0和999之间的行将位于分区2中。
分区1中没有行。这是按设计工作的。如果我切换了分区2,则行显示在Marc.foo2
。