如何使用Active Record查找具有任何重复数据的记录

时间:2015-04-08 13:47:21

标签: sql postgresql activerecord duplicates aggregate-functions

如何使用Activerecord或SQL在任何列中查找具有重复值的记录?

SELECT leads.id, leads.name, leads.email, leads.created_at, array_agg(tn2.id) as ids
FROM "leads" join leads tn2
    on leads.name = tn2.name
      or leads.cpf_cnpj = tn2.cpf_cnpj
      or leads.email = tn2.email
      or leads.phone -> 'cellphone' = tn2.phone -> 'cellphone'
      or leads.phone -> 'residence' = tn2.phone -> 'residence'
      or leads.phone -> 'commercial' = tn2.phone -> 'commercial' 
GROUP BY leads.id  ORDER BY leads.created_at DESC

使用array_agg我只想要来自重复对象的id,但它从所有记录中提供给我。 目前,我正在使用PostgreSQL。

1 个答案:

答案 0 :(得分:1)

  

如何在任何列中查找具有重复值的记录?

SELECT l.id, l.name, l.email, l.created_at, array_agg(l2.id) AS ids
FROM   leads l
WHERE EXISTS (
   SELECT 1
   FROM   leads 
   WHERE  id <> l.id 
   AND   (
          name = l.name    
   OR     cpf_cnpj = l.cpf_cnpj
   OR     email = l.email
   OR     phone->'cellphone'  = l.phone->'cellphone'
   OR     phone->'residence'  = l.phone->'residence'
   OR     phone->'commercial' = l.phone->'commercial'
         )
   );

但似乎你想要不同的东西:

  

如何从几个给定列中的至少一个列中具有相同值的行中获取每行的ID数组,最先输入的是哪个?

SELECT l.id, l.name, l.email, l.created_at
     , array_agg(l2.id  ORDER BY l2.created_at DESC NULL LAST) AS dupe_ids
FROM   leads l
JOIN   leads l2 ON l2.id <> l.id 
      AND   (
              l2.name = l.name    
       OR     l2.cpf_cnpj = l.cpf_cnpj
       OR     l2.email = l.email
       OR     l2.phone->'cellphone'  = l.phone->'cellphone'
       OR     l2.phone->'residence'  = l.phone->'residence'
       OR     l2.phone->'commercial' = l.phone->'commercial'
             )
GROUP   BY l.id
ORDER   BY l.created_at DESC NULL LAST;

假设id是主键。