我希望使用 SQL 完成以下目标:
1)查找重复记录的数量
根据列(“快照日期”)提取重复值的数量,并将其与上一个日期进行比较
2)查找添加的记录数
3)查找已删除的记录数
当前表
snapshot_date | unique ID
2018-08-15 1
2018-08-15 2
2018-08-15 3
2018-08-15 4
2018-08-15 5
2018-08-16 1
2018-08-16 3
2018-08-16 4
2018-08-16 6
2018-08-16 7
2018-08-16 8
2018-08-16 9
2018-08-17 3
2018-08-17 8
2018-08-17 10
2018-08-17 11
2018-08-17 12
2018-08-17 13
所需表
snapshot date | count | # of dupe from previous date | sum of ID added | sum of ID removed
2018-08-15 5 N/A N/A N/A
2018-08-16 7 3 4 2
2018-08-17 6 2 4 5
如果有人知道脚本可以到达所需的表格,我将非常感激!提前谢谢你们!
答案 0 :(得分:3)
如果使用的MySQL(至少在较早版本中不支持分析功能LEAD和LAG),则一种方法是进行一系列自联接,然后进行聚合以获取所需的结果:< / p>
SELECT
t1.snapshot_date,
t1.count,
t1.previous_dupe,
t1.num_added,
t2.num_subtracted
FROM
(
SELECT
t1.snapshot_date,
COUNT(*) AS count,
COUNT(t2.snapshot_date) AS previous_dupe,
COUNT(CASE WHEN t2.snapshot_date IS NULL THEN 1 END) AS num_added
FROM yourTable t1
LEFT JOIN yourTable t2
ON t1.snapshot_date = DATE_ADD(t2.snapshot_date, INTERVAL 1 DAY) AND
t1.uniqueID = t2.uniqueID
GROUP BY t1.snapshot_date
) t1
LEFT JOIN
(
SELECT
DATE_ADD(t1.snapshot_date, INTERVAL 1 DAY) AS snapshot_date,
COUNT(CASE WHEN t2.snapshot_date IS NULL THEN 1 END) AS num_subtracted
FROM yourTable t1
LEFT JOIN yourTable t2
ON t1.snapshot_date = DATE_SUB(t2.snapshot_date, INTERVAL 1 DAY) AND
t1.uniqueID = t2.uniqueID
GROUP BY t1.snapshot_date
) t2
ON t1.snapshot_date = t2.snapshot_date;
注意:我的结果与期望的结果之间存在细微差异,部分原因是您自己的数学错误,部分是由于查询中逻辑的工作方式。我报告最早在记录中添加了5个新ID,因为从概念上讲没有更早的记录,并且所有5个值在技术上都是新的。
这个问题特别难看,因为我们需要在两个单独的子查询中以不同的方向自我连接两次。
答案 1 :(得分:3)
这是我的看法。基于SQL Server
SELECT snapshot_date = COALESCE(c.snapshot_date, DATEADD(day, 1, p.snapshot_date)),
[count] = COUNT(c.snapshot_date),
dup_from_prev_day = SUM(CASE WHEN c.snapshot_date is not null
AND p.snapshot_date is not null
THEN 1 END),
sum_of_id_added = SUM(CASE WHEN c.snapshot_date is not null
AND p.snapshot_date is null
THEN 1 END),
sum_of_id_removed = SUM(CASE WHEN c.snapshot_date is null
AND p.snapshot_date is not null
THEN 1 END)
FROM yourTable c -- current
FULL OUTER JOIN yourTable p -- previous
ON c.snapshot_date = DATEADD(DAY, 1, p.snapshot_date)
AND c.uniqueID = p.uniqueID
GROUP BY COALESCE(c.snapshot_date, DATEADD(DAY, 1, p.snapshot_date))
HAVING COUNT(c.snapshot_date) > 0
/* RESULT :
snapshot_date count dup_from_prev_day sum_of_id_added sum_of_id_removed
2018-08-15 5 NULL 5 NULL
2018-08-16 7 3 4 2
2018-08-17 6 2 4 5
*/