使用CTAS,我们可以利用Polybase提供的并行性,以高度可扩展和高性能的方式将数据加载到 new 表中。
有没有办法使用类似的方法将数据加载到现有的表中?该表甚至可能是空的。
创建外部表并使用let values = Set(Array(String(0x01001110, radix: 2).characters).reversed().enumerated().map { (offset, element) -> Int in
Int(String(element))! << offset
}.filter { $0 != 0 })
- 我会假设它通过头节点,因此不是并行的?
我知道我也可以删除表并使用CTAS重新创建它,但我必须再次处理所有元数据(列名,数据类型,分布......)。
答案 0 :(得分:2)
您可以使用分区切换来执行此操作,但请记住不要在Azure SQL数据仓库中使用太多分区。请参阅“分区大小调整指南”&#39; here
请记住,不支持检查约束,因此源表必须使用与目标表相同的分区方案。
分区和切换语法的完整示例:
-- Assume we have a file with the values 1 to 100 in it.
-- Create an external table over it; will have all records in
IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'ext' )
EXEC ( 'CREATE SCHEMA ext' )
GO
-- DROP EXTERNAL TABLE ext.numbers
IF NOT EXISTS ( SELECT * FROM sys.external_tables WHERE object_id = OBJECT_ID('ext.numbers') )
CREATE EXTERNAL TABLE ext.numbers (
number INT NOT NULL
)
WITH (
LOCATION = 'numbers.csv',
DATA_SOURCE = eds_yourDataSource,
FILE_FORMAT = ff_csv
);
GO
-- Create a partitioned, internal table with the records 1 to 50
IF OBJECT_ID('dbo.numbers') IS NOT NULL DROP TABLE dbo.numbers
CREATE TABLE dbo.numbers
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED INDEX ( number ),
PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
)
AS
SELECT *
FROM ext.numbers
WHERE number Between 1 And 50;
GO
-- DBCC PDW_SHOWPARTITIONSTATS ('dbo.numbers')
-- CTAS the second half of the external table, records 51-100 into an internal one.
-- As check contraints are not available in SQL Data Warehouse, ensure the switch table
-- uses the same scheme as the original table.
IF OBJECT_ID('dbo.numbers_part2') IS NOT NULL DROP TABLE dbo.numbers_part2
CREATE TABLE dbo.numbers_part2
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED INDEX ( number ),
PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
)
AS
SELECT *
FROM ext.numbers
WHERE number > 50
GO
-- Partition switch it into the original table
ALTER TABLE dbo.numbers_part2 SWITCH PARTITION 2 TO dbo.numbers PARTITION 2;
SELECT *
FROM dbo.numbers
ORDER BY 1;