我的数据库中有一些几乎重复的数据(基于这5列重复:Date
,Code
,Expiry
,TheType
,Strike
,还有更多的列,但他们不会被计入标记重复的记录)。我想在每种情况下只保留一条记录,而我想要保留的是mtm
列最接近其checkprice
列的那条记录(即最小化abs(mtm-checkprice)
)。因此,如果我可以通过该表达式对分区进行排序,那么我认为下面的CTE非常接近。我尝试的方式给了我错误Invalid column name 'diff'.
WITH CTE AS(
SELECT *, ABS(Mtm - checkprice) as diff,
RN = ROW_NUMBER()OVER(PARTITION BY Date, Strike, Mtm, /* ALL THE OTHER COLUMN NAMES */
ORDER BY diff DESC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
关于如何纠正这个问题的任何想法?
答案 0 :(得分:1)
使用ABS(mtm-checkprice)
的{{1}}中的ORDER BY
:
ROW_NUMBER
您无法访问WITH CTE AS(
SELECT *, Diff = ABS(mtm-checkprice),
RN = ROW_NUMBER()OVER(PARTITION BY Date, Code, Expiry, TheType, Strike
ORDER BY ABS(mtm-checkprice) ASC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
中的Diff
,只能在CTE之外访问。