使用子查询计算总计数为零时删除分组数据集

时间:2014-08-06 14:52:37

标签: sql

我正在生成一个看起来像这样的数据集

category    user     total

   1      jonesa       0

   2      jonesa       0

   3      jonesa       0

   1      smithb       0

   2      smithb       0

   3      smithb       5

   1      brownc       2

   2      brownc       3

   3      brownc       4

如果特定用户在所有类别中有0条记录,是否可以从该组中删除其行?如果用户有类似smithb的活动,我想保留他们的所有记录。甚至是零行。不知道如何去做,我认为CASE声明可能会有所帮助,但我不确定,这对我来说非常复杂。这是我的查询

SELECT DISTINCT c.category,
  u.user_name,
  CASE WHEN (
    SELECT COUNT(e.entry_id)
    FROM category c1 
    INNER JOIN entry e1
      ON c1.category_id = e1.category_id
      WHERE c1.category_id = c.category_id
      AND e.user_name = u.user_name
      AND e1.entered_date >= TO_DATE ('20140625','YYYYMMDD')
      AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')) > 0 -- I know this won't work
    THEN 'Yes'
  ELSE NULL
  END AS TOTAL
FROM user u 
INNER JOIN role r
  ON u.id = r.user_id
    AND r.id IN (1,2),
    category c 
LEFT JOIN entry e
  ON c.category_id = e.category_id
WHERE c.category_id NOT IN (19,20)

我意识到案例陈述不起作用,但这是试图如何实现这一点。我真的不确定它是否可行或是最好的方向。感谢任何指导。

3 个答案:

答案 0 :(得分:2)

试试这个:

delete from t1
where user in (
  select user
  from t1
  group by user
  having count(distinct category) = sum(case when total=0 then 1 else 0 end) )

子查询可以让所有用户都符合您的删除要求。

count(distinct category)获取用户拥有的分类数 sum(case when total=0 then 1 else 0 end)获取用户拥有活动的行数。

答案 1 :(得分:1)

有很多方法可以做到这一点,但SQL越简洁,你就越难以遵循逻辑。出于这个原因,我认为使用多个公用表表达式将避免使用冗余连接,同时最具可读性。

-- assuming user_name and category_name are unique on [user] and [category] respectively.  

WITH valid_categories (category_id, category_name) AS 
(
    -- get set of valid categories
    SELECT c.category_id, c.category AS category_name
    FROM category c
    WHERE c.category_id NOT IN (19,20)
),
valid_users ([user_name]) AS 
(
    -- get set of users who belong to valid roles
    SELECT u.[user_name]
    FROM [user] u 
    WHERE EXISTS (
        SELECT *
        FROM [role] r
        WHERE u.id = r.[user_id] AND r.id IN (1,2)
    )

),
valid_entries (entry_id, [user_name], category_id, entry_count) AS
(
    -- provides a flag of 1 for easier aggregation
    SELECT e.[entry_id], e.[user_name], e.category_id, CAST( 1 AS INT) AS entry_count
    FROM [entry] e  
    WHERE e.entered_date BETWEEN TO_DATE('20140625','YYYYMMDD') AND TO_DATE('20140731', 'YYYYMMDD')
    -- determines if entry is within date range 
),
user_categories ([user_name], category_id, category_name) AS

(   SELECT u.[user_name], c.category_id, c.category_name
    FROM valid_users u
    -- get the cartesian product of users and categories
    CROSS JOIN valid_categories c
    -- get only users with a valid entry 
    WHERE EXISTS (
        SELECT *
        FROM valid_entries e
        WHERE e.[user_name] = u.[user_name]
    )
)

/*

You can use these for testing.

SELECT COUNT(*) AS valid_categories_count
FROM valid_categories

SELECT COUNT(*) AS valid_users_count
FROM valid_users

SELECT COUNT(*) AS valid_entries_count
FROM valid_entries

SELECT COUNT(*) AS users_with_entries_count
FROM valid_users u
WHERE EXISTS (
    SELECT *
    FROM user_categories uc
    WHERE uc.user_name = u.user_name
)

SELECT COUNT(*) AS users_without_entries_count
FROM valid_users u
WHERE NOT EXISTS (
    SELECT *
    FROM user_categories uc
    WHERE uc.user_name = u.user_name
)

SELECT uc.[user_name], uc.[category_name], e.[entry_count] 
FROM user_categories uc
INNER JOIN  valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
*/

-- Finally, the results: 

SELECT uc.[user_name], uc.[category_name], SUM(NVL(e.[entry_count],0)) AS [entry_count]
FROM user_categories uc
LEFT OUTER JOIN  valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])

答案 2 :(得分:1)

这是另一种方法:

WITH totals AS (
  SELECT
    c.category,
    u.user_name,
    COUNT(e.entry_id) AS total,
    SUM(COUNT(e.entry_id)) OVER (PARTITION BY u.user_name) AS user_total
  FROM
    user u
  INNER JOIN
    role r ON u.id = r.user_id
  CROSS JOIN
    category c
  LEFT JOIN
    entry e ON c.category_id = e.category_id
           AND u.user_name = e.user_name
           AND e1.entered_date >= TO_DATE ('20140625', 'YYYYMMDD')
           AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')
  WHERE
    r.id IN (1, 2)
    AND c.category_id IN (19, 20)
  GROUP BY
    c.category,
    u.user_name
)
SELECT
  category,
  user_name,
  total
FROM
  totals
WHERE
  user_total > 0
;

totals派生表计算每个用户和类别的总计以及每个用户所有类别的总计(使用SUM() OVER ...)。主查询仅返回用户总数大于零的行。