Postgresql IN子句与具有JOIN性能的嵌套SELECT

时间:2017-02-10 18:05:29

标签: postgresql join count

我现在有一个查询效果很好,但会出现缩放问题。我发现的解决方案非常慢。我正在寻求加快第二次查询。

不能很好地扩展的旧查询:

SELECT user.score
FROM users
WHERE
  user.id IN (
    SELECT user_id 
    FROM companies_users 
    ON companies_users.company_id = X
)

然后我会迭代不同的分数来分组。分数范围从-10到10.问题来自IN SELECT语句和迭代。可能会返回超过一百万个user_ids。

我提出的替代方案应该更好地扩展,但速度非常慢:

SELECT 
  COUNT(*) as total_scores,
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = 10 AND cu.company_id = X) as "10",
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = 9 AND cu.company_id = X) as "9",
...
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = -9 AND cu.company_id = X) as "-9",
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = -10 AND cu.company_id = X) as "-10"
FROM users
  JOIN companies_users as cu ON cu.company_id = cu.user_id
  WHERE cu.company_id = X

第一个查询需要迭代才能进入工作数据。第二个是好的。

有没有办法将JOIN拉出嵌套的SELECT?这似乎导致第二个查询中的大部分减速。另外,我是对的,第一个查询在处理数百万个ID时不会很好地扩展吗?

1 个答案:

答案 0 :(得分:1)

会出现什么问题:

SELECT u.score
FROM companies_users cu
    JOIN users u ON cu.user_id = u.id
WHERE cu.company_id=?
GROUP BY u.score
ORDER BY u.score

另外,你有适当的指数吗?您需要companies_users(company_id)上的索引和users(id)上的索引。您可以尝试在companies_users(user_id)上添加一个,以防计划者决定以相反的方式进行查询。 EXPLAINEXPLAIN ANALYZE是您的朋友。