我尝试实现slope one算法。我有系统在线咨询和专家可以咨询用户。专家是type = 2的用户。而且我需要制作“咨询过这位专家的人员”。子查询为expert_id1和expert_id2返回序列为0(未查阅),1(查阅)的数组,但此序列由超过100k的值组成,并且此查询执行速度非常慢。请优化此查询的任何想法。
SELECT e1.id as expert_id1, e2.id as expert_id2,
(
SELECT array_accum(c.consulted) FROM (
SELECT CASE WHEN (c.id is null) THEN 0 ELSE 1 END as consulted
FROM co_user u
CROSS JOIN user e
LEFT JOIN consultation c ON e.id = c.expert_id and c.user_id = u.id
WHERE e.type = 2 AND e.id = e1.id) as c
) as expert_id1_consulted,
(
SELECT array_accum(c.consulted) FROM (
SELECT CASE WHEN (c.id is null) THEN 0 ELSE 1 END as consulted
FROM user u
CROSS JOIN user e
LEFT JOIN consultation c ON e.id = c.expert_id and c.user_id = u.id
WHERE e.type = 2 AND e.id = e2.id) as c
) as expert_id2_consulted
FROM user e1
CROSS JOIN user e2
WHERE e1.type = 2 AND
e2.type = 2 AND
e2.id > e1.id
ORDER BY e1.id
答案 0 :(得分:0)
虽然解释分析输出真的很有帮助但是在这个查询中有几个红旗。 FWIW我倾向于避免列列表中的子选择到可以避免的程度,因为这会降低可读性。
但是在这种情况下,您的子计划会通过可能的大型表创建不必要的连接。
要做的第一件事是分解这些子选择。它们使您的查询更难以阅读和跟踪,并且它们添加了许多重复的连接,这意味着可能对大型表进行额外扫描。例如,您可以将CASE
放在array_agg
之内。
如果这不起作用,请发布解释分析结果,我们可以从那里查看索引。