假设我们在一些未指定的关系SQL数据库中有4个表A,B,C,D。参考文献B,也是C和D.引用意味着A具有列A.refX_id = X.id,X是A,B和C(公共外键1:N)。
我想要的是根据所有子表B,C和D中的列来查询表A.我的问题是:以下哪些变体通常更好?(在术语中可用性,效率,速度。)
变式1:
SELECT DISTINCT A.* FROM A
JOIN B ON A.refB_id = B.id
JOIN C ON A.refC_id = C.id
JOIN D ON A.refD_id = D.id
WHERE <condition on B> AND <condition on C> AND <condition on D>;
我更喜欢从数据库的角度来看,但看起来有点难以编程。
变式2:
SELECT id FROM B WHERE <condition on B>; # result store to array "BIds" on program side
SELECT id FROM C WHERE <condition on C>; # result store to array "CIds" on program side
SELECT id FROM D WHERE <condition on D>; # result store to array "DIds" on program side
SELECT A.* FROM A
WHERE refB_id IN (<B_ids>) AND refC_id IN (<C_ids>) AND refD_id IN (<D_ids>);
# <B_ids> menas expand whole array of ids, which can result in a very long query string
我认为 Variant 2 完全是黑白的,无法使用潜在的大数据。但我听说,很多框架通常都会使用它,因为它相对简单。如果我知道“IN”子句的内容取自另一个查询的结果,那么在一般情况下查询这样的数据是否合法?
答案 0 :(得分:3)
不确定哪个框架使用第二种方法,但第一种方法是我将要去的,而且其他人都会。如果您在所有表的连接列上创建了正确的索引,那么第一种方法将产生比第二种更好的计划,因为您有多个IN
子句,如果每个IN
必须处理数百万元素???。
另外,我会将INNER JOIN
更改为LEFT JOIN
,假设并非所有ID都匹配,并将WHERE
条件移至JOIN ON
条件,如
SELECT DISTINCT A.* FROM A
LEFT JOIN B ON A.refB_id = B.id AND <condition on B>
LEFT JOIN C ON A.refC_id = C.id AND <condition on C>
LEFT JOIN D ON A.refD_id = D.id AND <condition on D>;
答案 1 :(得分:3)
我建议您使用IN
或EXISTS
:
SELECT A.*
FROM A
WHERE EXISTS (SELECT 1 FROM B WHERE A.refB_id = B.id AND <condition on B>) AND
EXISTS (SELECT 1 FROM C WHERE A.refC_id = C.id AND <condition on C>) AND
EXISTS (SELECT 1 FROM D WHERE A.refD_id = D.id AND <condition on D>);
这种方法的优点:
SELECT DISTINCT
的重复项。EXISTS
。编辑:
您可以使用IN
和子查询来编写此代码:
SELECT A.*
FROM A
WHERE A.refB_id IN (SELECT B.id FROM B WHERE <condition on B>) AND
A.refC_id IN (SELECT C.id FROM C WHERE <condition on C>) AND
A.refD_id IN (SELECT D.id FROM D WHERE <condition on D>);