我很长一段时间都在努力解决这个问题而且不知道如何解决它。我很难描述,所以请耐心等待。有两个表:
表“用户”
UserId PK
Gender
表格“表格”
FormId PK
UserId1 FK
UserId2 FK
Type
表单始终与两个用户相关,但并非所有用户都有相关表单。现在我想只计算那些有相关表格的用户的指定性别。
结果,我想要......像这样:
# | Gender | GenderCount
1 | male | 43
2 | female | 12
3 | trans | 2
我尝试了以下SQL脚本,但结果并不明显(所有GenderCount的总和大于实际用户数)
SELECT u.Gender AS 'Gender', COUNT(u.Gender) AS 'GenderCount'
FROM Users u, Forms f
WHERE ((f.UserId1 = u.UserId)
OR (f.UserId2 = u.UserId))
AND (Type = 'Foo')
GROUP BY Gender
ORDER BY GenderCount
DESC
解决此问题的任何提示?
答案 0 :(得分:2)
让我们来看看你想要的东西:
这样的话,答案变得相当明显,至少在伪代码中是这样的:
SELECT
u.Gender,
COUNT(u.Gender)
FROM
Users u
WHERE
[User has answered a form]
GROUP BY
u.Gender
确定用户是否已回答表单的最简单方法取决于所使用的SQL的特定风格。您需要使用子查询。如何访问它有几种选择。
IN
是最常用的方法:
SELECT
u.Gender Gender,
COUNT(u.Gender) GenderCount
FROM
Users u
WHERE
u.id IN (
SELECT f.UserId1 user_id FROM Forms f WHERE Type = 'Foo'
UNION
SELECT f.UserId2 user_id FROM Forms f WHERE Type = 'Foo'
)
GROUP BY
Gender
ORDER BY
GenderCount DESC
如果可用,EXISTS
更自然地阅读,有时更快:
SELECT
u.Gender Gender,
COUNT(u.Gender) GenderCount
FROM
Users u
WHERE
EXISTS(
SELECT '1'
FROM Forms f
WHERE
(f.UserId1 = u.id OR f.UserId2 = u.id)
AND Type = 'Foo'
)
GROUP BY
Gender
ORDER BY
GenderCount DESC
关于速度:查询优化器通常会尽可能将IN
转换为EXISTS
,以避免不必要地选择额外的行。但是,使用多列需要OR
或UNION
,所以在这种情况下甚至可能相当。即:OR
和UNION
都不能很好地与索引配合使用。
答案 1 :(得分:1)
跳过为每个用户生成多行的联接:
SELECT Gender, COUNT(Gender) AS 'GenderCount'
FROM Users
WHERE UserId IN (SELECT UserId1 FROM Forms WHERE Type = 'Foo'
UNION
SELECT UserId2 FROM Forms WHERE Type = 'Foo')
GROUP BY Gender
ORDER BY GenderCount DESC
或者如果你想避免使用UNION(在这种情况下BTW完全有效),你可以像这样使用OR:
SELECT Gender, COUNT(Gender) AS 'GenderCount'
FROM Users
WHERE UserId IN (SELECT UserId1 FROM Forms WHERE Type = 'Foo')
OR UserId IN (SELECT UserId2 FROM Forms WHERE Type = 'Foo')
GROUP BY Gender
ORDER BY GenderCount DESC
正如其他人所指出的那样,也有办法使用JOIN来做到这一点。但是,JOIN会为DBMS引擎增加不必要的复杂性,因为它首先需要匹配行,然后减少到DISTINCT值。
答案 2 :(得分:1)
SELECT u1.Gender AS 'Gender', COUNT(*) AS 'GenderCount'
FROM
Users u1
INNER JOIN
(SELECT DISTINCT u.UserId
FROM
Users u
INNER JOIN Forms f ON ((f.UserId1 = u.UserId)
OR (f.UserId2 = u.UserId))
AND (f.Type = 'Foo')) T ON T.UserId = u1.UserId
GROUP BY Gender
ORDER BY GenderCount DESC
答案 3 :(得分:0)
你应该使用
count(distinct u.UserId)
这样用户只会被计算一次:count(distinct field_name)计算field_name中包含的唯一值的数量,因此在主键上计算distinct数量可以为您提供唯一用户的数量,这正是您要查找的内容。
此外,您可能最好不要使用像
这样的in子句,而不是加入select Gender, count(distinct UserId) as GenderCount
from Users
where u.UserId in (select UserId1 from Forms) or u.UserId in (select UserId2 from Forms)
它可能会稍快一些。