我正在研究Terradata SQL。我想得到重复字段及其计数和其他变量。我只能找到获得计数的方法,但也不能确定变量。
可用输入
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
预期产出:
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
我在寻找什么:
我正在寻找所有重复的行,其中重复由ID决定并检索重复的行。
我现在所拥有的只是:
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
这未显示正确的数据。谢谢。
答案 0 :(得分:2)
您可以使用window aggregate function,如下所示:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
使用ISO SQL语法的teradata扩展,您可以将上述内容简化为:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
答案 1 :(得分:1)
作为已接受且完全正确答案的替代方案,您可以使用:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
修改:根据您的修改,重复项仅在id
上。相应地编辑了我的查询。
答案 2 :(得分:0)
试试这个,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1