我有一个Postges数据库,其中包含一个非常长的表和3列,如下所示:
s_id | c_id | a_id
1 | 1 | 2
1 | 1 | 3
1 | 3 | 15
2 | 1 | 2
2 | 2 | 23
3 | 1 | 2
3 | 3 | 16
我有一个查询,找到所有包含c_id 1和3的s_ids,返回它们及其计数:
SELECT s_id, COUNT(s_id) as matching_clusters
FROM test
WHERE c_id IN (1,3)
GROUP BY s_id HAVING COUNT(c_id) >= 2
ORDER BY matching_clusters DESC
我得到的是以下内容:
s_id | matching_clusters
1 | 3
3 | 2
但是,我只想计算一次重复的c_id,这样结果应该是
s_id | matching_clusters
1 | 2
3 | 2
有关如何执行此操作的任何建议?我以为我可以将DISTINCT
粘贴到COUNT命令中,但这不起作用。我可以使用不同的c_id将表结果连接到表本身,但我不想重新运行查询,因为在此表上运行查询是非常昂贵的计算方法。
答案 0 :(得分:1)
如果我理解正确,那么这将有效:
SELECT s_id, 2 as matching_clusters
FROM test
WHERE c_id IN (1,3)
GROUP BY s_id
HAVING COUNT(c_id) >= 2
ORDER BY matching_clusters DESC;
这可能是你想要的:
SELECT s_id, COUNT(DISTINCT c_id) as matching_clusters
FROM test
WHERE c_id IN (1,3)
GROUP BY s_id
HAVING COUNT(DISTINCT c_id) = 2
ORDER BY matching_clusters DESC;
请注意在distinct
子句中使用having
。
答案 1 :(得分:-1)
试试这个: -
SELECT s_id, COUNT(DISTINCT s_id) as matching_clusters
FROM test
WHERE c_id IN (1,3)
GROUP BY s_id HAVING COUNT(c_id) >= 2
ORDER BY matching_clusters DESC