如何使用SQL来选择重复记录以及相关项目的计数?

时间:2010-05-28 16:40:06

标签: sql database mysql join

我知道这个问题的标题有点令人困惑,所以请耐心等待。 :)

我有一个带有Person记录的(MySQL)数据库。 Person也有一个slug字段。不幸的是,slug字段并不是唯一的。有许多重复记录,即记录具有不同的ID但具有相同的名字,姓氏和slug。 Person也可能包含0个或更多相关文章,博客条目和播客剧集。

如果这令人困惑,这里是结构图:

alt text http://mipadi.cbstaff.com/images/misc/people_db.jpg

我想制作符合此条件的记录列表:重复记录(即同一段塞字段),适用于同时拥有至少1篇文章,博客文章或播客剧集的人。

我有一个SQL查询,它将列出具有相同段塞字段的所有记录:

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records
FROM
 people_person
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

但这包括可能没有至少1篇文章,博客条目或播客的人的记录。我可以调整一下以符合第二个标准吗?

修改

我更新了数据库图表以简化它并使我更清楚我在做什么。 (注意,一些数据库表名改变了 - 我之前尝试对结构进行更高层次的观察,但有点不清楚。)

5 个答案:

答案 0 :(得分:2)

Select P.id, P.first_name, P.last_name, P.slug
From people_person as P
    Join    (
            Select P1.slug
            From people_person As P1
            Where Exists    (
                            Select 1
                            From magazine_author As ma1
                            Where ma1.person_id = P1.id
                            Union All
                            Select 1
                            From podcast_episode_guests As pod1
                            Where pod1.person_id = P1.Id
                            Union All
                            Select 1
                            From blogs_blog_authors As b1
                            Where b1.person_id = P1.Id
                            )
            Group By P1.slug
            Having Count(*) > 1
            ) As dup_slugs
        On dup_slugs.slug = P.slug
Order By P.last_name, P.first_name, P.id

答案 1 :(得分:1)

您仍然可以包含WHERE子句来过滤结果:

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records
FROM
 people_person
WHERE id IN (SELECT id FROM article)
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

答案 2 :(得分:1)

您可以通过拥有子句来处理它:

select Id
        , last_name
        , first_name
        , slug
        , COUNT(*) as Person_Records
    from Person as p
    group by Id
            , last_name
            , first_name
            , slug
        having COUNT(slug) > 1
            and ( 
                select COUNT(*)
                    from Author as a
                    where a.Person_Id = p.Id
            ) > 1
            and (
                select COUNT(*)
                    from Podcast_Guests as pg
                    where pg.Person_Id = p.Id
            ) > 1

我省略了剩余的条件,因为这是一个简单的样本。

我希望这有帮助! =)

答案 3 :(得分:1)

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records,
FROM
 people_person
WHERE 
 id IN (SELECT person_id from podcast_guests GROUP BY person_id) OR 
 id IN (SELECT person_id from authors GROUP BY person_id) OR 
 [....]
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

答案 4 :(得分:0)

问题和其他答案中的其他sql语句都是错误的,我将尝试解释如何使用函数避免鸡和蛋问题(这使得代码更清晰):

SELECT  first_name,
        last_name, 
        slug,
        COUNT(slug) AS person_records,
        SUM(get_count_articles(id)) AS total_articles
FROM  people_person
GROUP BY first_name,
        last_name, 
        slug
HAVING  COUNT(*) > 1 AND SUM(get_count_articles(id))>=1
ORDER   BY  last_name, first_name;

使用该函数(用Oracle语法编写,请原谅我对mysql函数缺乏了解)。

FUNCTION get_count_articles(p_id NUMBER) RETURNS NUMBER IS
  l_mag_auth NUMBER;
  l_pod_guests NUMBER;
  l_blog_auth NUMBER;
BEGIN
  SELECT COUNT(*)
  INTO l_mag_auth
  FROM magazine_author ma1, article a1
  WHERE ma1.person_id = p_id;

  SELECT COUNT(*) 
  INTO l_pod_guests
  FROM podcast_episode_guests As pod1
  WHERE pod1.person_id = p_id;

  SELECT COUNT(*)
  INTO l_blog_auth
  FROM blogs_blog_authors As b1
  WHERE b1.person_id = p_id;

  RETURN l_mag_auth+l_pod_guests+l_blog_auth;
END;

注1:magazine_author应与上述文章相关联,因为实际上可能没有文章。

注意2:我已从原始问题select和group by中删除了ID,因为它会强制执行错误的答案(因为id在表中应该是唯一的,否则将不返回任何记录)。语法计数(slug)可能会混淆这里的问题。如果输出需要两个重复的行,那么你必须重新链接到people_person表,以显示slug的id列表。