如何找到重复项并在重复项中附加一个随机数,以便它们不再重复。
样本表:
primary_id, student_id, student_name
1 80 John Terry
2 81 Didier Drogba
3 80 John Terry
4 82 Frank Lampard
5 80 John Terry
我希望通过在副本的名称后附加一个随机数来消除重复项。例如。在上面的场景中,我想重命名
显示在第3行到112233_DUP_John Terry
和第5行到668877_DUP_John Terry
的student_name。请注意,副本的第一个条目保持不变。在这种情况下,第1行保持不变。
重命名格式为:6_digit_random_number
+ _DUP_
+ Existing Student Name
到目前为止,我可以使用下面的SQL获取重复项:
SELECT student_id, student_name FROM (select student_id, student_name, count(*) from student
group by student_id, student_name
HAVING count(*) > 1 order by count DESC) AS duplicates
我知道我也可以使用SQL生成一个随机数,但我无法弄清楚如何将它附加到重复的条目
正在运行Postgresql数据库
答案 0 :(得分:3)
首先使用窗口函数而不是组方法获取重复的行,例如
SELECT
primary_id, student_id, student_name
FROM
(
SELECT
row_number() OVER (PARTITION BY student_id, student_name) AS dup_no,
primary_id, student_id, student_name
FROM students
) dup
WHERE dup.dup_no > 1;
然后将其与UPDATE ... FROM
结合使用,只更新重复项:
UPDATE students
SET student_name = to_char(dupstudents.dup_no, '000000') || '_DUP_' || students.student_name
FROM (
SELECT
row_number() OVER (PARTITION BY student_id, student_name) AS dup_no,
primary_id, student_id, student_name
FROM students
) dupstudents
WHERE students.primary_id = dupstudents.primary_id
AND dupstudents.dup_no > 1;
e.g。 http://sqlfiddle.com/#!15/5b1b8/9
我还没有对随机ID"位;我只是使用了重复的偏移位置。您可以通过适当调用(random()*10^6)::integer
或其他任何内容来替换它,但要注意随机值冲突。
答案 1 :(得分:0)
试试这个:
select student_id, R_N, student_name,
CASE WHEN R_N <> 1 THEN to_char( r_n,'000000')||'_DUP_' ELSE '' END ||student_name
FROM (SELECT *,
row_number() OVER ( PARTITION BY student_id ORDER BY student_name) as R_N from student) AS T1
测试
使用随机数:
select student_id, R_N, student_name,
CASE WHEN R_N <> 1 THEN to_char(random()*1000000,'000000')||'_DUP_' ELSE '' END ||student_name
FROM (SELECT *,
row_number() OVER ( PARTITION BY student_id ORDER BY student_name) as R_N from student) AS T1
在没有子查询的一个陈述中:
select student_id,
row_number() OVER ( PARTITION BY student_id ORDER BY student_name) ,
student_name,
CASE WHEN row_number() OVER ( PARTITION BY student_id ORDER BY student_name) <> 1
THEN to_char( random()*1000000,'000000')||'_DUP_' ELSE '' END ||student_name
from student
;
答案 2 :(得分:0)
with cte as
(
SELECT
primary_id, student_id, student_name
FROM
(
SELECT
row_number() OVER (PARTITION BY stu_id, stu_name) AS dup_no,
primary_id, student_id, student_name
FROM student
) dup
WHERE dup.dup_no > 1
),cte2 as(
select (to_char(random()*1000000,'000000')) || '_DUP_' ||student_name as
duplictaestudentname,primary_id,student_id from student where primary_id in (select
primary_id from cte)
)
update student as v
set student_name=s.duplictaestudentname
from cte2 as s
where v.primary_id=s.primary_id