考虑包含以下字段的表G H I J
\ / \ /
D E F
\ | / \
\ | / |
\|/ |
B C
\ /
\ /
A
A = = A^0
B = A^ = A^1 = A~1
C = A^2 = A^2
D = A^^ = A^1^1 = A~2
E = B^2 = A^^2
F = B^3 = A^^3
G = A^^^ = A^1^1^1 = A~3
H = D^2 = B^^2 = A^^^2 = A~2^2
I = F^ = B^3^ = A^^3^
J = F^2 = B^3^2 = A^^3^2
:mdl_files
,id
,contenthash
,timecreated
。
此表存储附件文件。
我们认为所有具有相同内容哈希的行都是重复行,我只想保留最旧的行(如果日期相等,则保留第一行)。 我该怎么办?
以下查询:
filesize
返回:
SELECT
id,
contenthash,
filesize,
to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash;
我想得到这个结果集:
2480229 00002e87605311feb82b70473b61e81f0223c774 18178 2016-10-05
2997411 0000bfd20ef84948eee6811ce5bbac03de42ccb0 1293 2017-03-31
1304839 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-10
1364656 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-17
71568 0003c6aec5835964870902d697c06d21abf76bf7 139439 2013-04-19
2959945 000419c19d77df7285e669614075b47414e3ab2c 398 2017-03-20
3483049 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
3483047 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
我希望从结果集中删除以下重复的行:
2480229 00002e87605311feb82b70473b61e81f0223c774 18178 2016-10-05
2997411 0000bfd20ef84948eee6811ce5bbac03de42ccb0 1293 2017-03-31
1304839 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-10
71568 0003c6aec5835964870902d697c06d21abf76bf7 139439 2013-04-19
2959945 000419c19d77df7285e669614075b47414e3ab2c 398 2017-03-20
3483049 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
答案 0 :(得分:3)
使用DISTINCT ON
:
SELECT DISTINCT ON (contenthash)
id,
contenthash,
filesize,
to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash, timecreated, id;
DISTINCT ON
是Postgres扩展名,可确保为括号中每个键的唯一组合返回一行。特定行是根据order by
子句找到的第一行。
答案 1 :(得分:2)
您可以尝试将ROW_NUMBER()
与 windows函数结合使用来制作行号,然后将其删除。
SELECT t.*
FROM (
SELECT
id,
contenthash,
filesize,
ROW_NUMBER() OVER (PARTITION BY contenthash,filesize order by timecreated) rn
FROM mdl_files
) t
where t.rn = 1
如果您想DELETE
复制数据,则可以在where子句中使用EXISTS
。
DELETE
FROM mdl_files f WHERE EXISTS(
SELECT 1
FROM (
SELECT
id,
contenthash,
filesize,
ROW_NUMBER() OVER (PARTITION BY contenthash,filesize order by timecreated) rn
FROM mdl_files
) t
where t.rn > 1 and t.id = f.id
)