具有DELETE的CTE-SQL数据仓库的替代方法

时间:2019-02-11 06:01:03

标签: sql azure-sql-database sql-delete sql-cte azure-sql-data-warehouse

我想删除表中的所有行,其中batchId(运行编号)早于前两行。我可能可以在带有查询的SQL数据库中做到这一点:

WITH CTE AS(
    SELECT
        *,
        DENSE_RANK() OVER(ORDER BY BATCHID DESC) AS RN
    FROM MyTable
)
DELETE FROM CTE WHERE RN>2

但是根据this,SQL数据仓库中不允许这样做。在这里寻找替代品。

3 个答案:

答案 0 :(得分:1)

您可以尝试使用JOIN

delete d from MyTable d
join 
(
 SELECT
        *,
        RN = ROW_NUMBER() OVER(PARTITION BY BATCH_ID ORDER BY BATCH_ID DESC)
    FROM MyTable
)A on d.batch_id=A.batch_id where RN >2

答案 1 :(得分:0)

您可以尝试:

delete t from mytable t
    where batchId < (select max(batchid) from mytable);

哦,如果您想保留两个,也许可以用:

delete t from mytable t
    where batchId < (select batchid
                     from mytable
                     group by batchid
                     limit 1 offset 1
                    );

答案 2 :(得分:0)

Azure SQL数据仓库仅支持有限的T-SQL表面积和DELETE操作和带有DELETEs子句的FROM的CTE,这将产生以下错误:

  

信息100029,第16级,状态1,第1行
  DELETE语句当前不支持FROM子句。

它确实支持子查询,因此是一种编写这样的语句的方法:

DELETE dbo.MyTable
WHERE BATCHID Not In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC );

Azure SQL数据仓库支持此语法,我已经对其进行了测试。我不确定它在数十亿行上的效率如何。您还可以考虑分区切换。

如果要删除表的很大一部分,则可能需要使用CTAS将要保留的数据放入新表中,例如:

-- Keep the most recent two BATCHIDS
CREATE TABLE dbo.MyTable2
WITH
(
    CLUSTERED COLUMNSTORE INDEX,
    DISTRIBUTION = HASH( BATCHID )
    -- Add partition scheme here if required
)
AS
SELECT  *
FROM dbo.MyTable
WHERE BATCHID In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC )
OPTION ( LABEL = 'CTAS : Keep top two BATCHIDs' );
GO

-- Rename or DROP old table
RENAME OBJECT dbo.MyTable TO MyTable_Old;
RENAME OBJECT dbo.MyTable2 TO MyTable;
GO

-- Optionally DROP MyTable_Old if everything has been successful
-- DROP TABLE MyTable_Old

here对此技术进行了更详细的描述。