我想删除表中的所有行,其中batchId(运行编号)早于前两行。我可能可以在带有查询的SQL数据库中做到这一点:
WITH CTE AS(
SELECT
*,
DENSE_RANK() OVER(ORDER BY BATCHID DESC) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN>2
但是根据this,SQL数据仓库中不允许这样做。在这里寻找替代品。
答案 0 :(得分:1)
您可以尝试使用JOIN
delete d from MyTable d
join
(
SELECT
*,
RN = ROW_NUMBER() OVER(PARTITION BY BATCH_ID ORDER BY BATCH_ID DESC)
FROM MyTable
)A on d.batch_id=A.batch_id where RN >2
答案 1 :(得分:0)
您可以尝试:
delete t from mytable t
where batchId < (select max(batchid) from mytable);
哦,如果您想保留两个,也许可以用:
delete t from mytable t
where batchId < (select batchid
from mytable
group by batchid
limit 1 offset 1
);
答案 2 :(得分:0)
Azure SQL数据仓库仅支持有限的T-SQL表面积和DELETE
操作和带有DELETEs
子句的FROM
的CTE,这将产生以下错误:
信息100029,第16级,状态1,第1行
DELETE语句当前不支持FROM子句。
它确实支持子查询,因此是一种编写这样的语句的方法:
DELETE dbo.MyTable
WHERE BATCHID Not In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC );
Azure SQL数据仓库支持此语法,我已经对其进行了测试。我不确定它在数十亿行上的效率如何。您还可以考虑分区切换。
如果要删除表的很大一部分,则可能需要使用CTAS将要保留的数据放入新表中,例如:
-- Keep the most recent two BATCHIDS
CREATE TABLE dbo.MyTable2
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = HASH( BATCHID )
-- Add partition scheme here if required
)
AS
SELECT *
FROM dbo.MyTable
WHERE BATCHID In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC )
OPTION ( LABEL = 'CTAS : Keep top two BATCHIDs' );
GO
-- Rename or DROP old table
RENAME OBJECT dbo.MyTable TO MyTable_Old;
RENAME OBJECT dbo.MyTable2 TO MyTable;
GO
-- Optionally DROP MyTable_Old if everything has been successful
-- DROP TABLE MyTable_Old
here对此技术进行了更详细的描述。