我在MySQL中有两个表,我正在与以下属性进行比较:
tbl_fac : facility_id, chemical_id, criteria
10 , 25 , 50
10 , 26 , 60
10 , 27 , 60
11 , 25 , 30
11 , 27 , 31
etc...
tbl_samp: sample_id, chemical_id, result
5 , 25 , 51
5 , 26 , 61
6 , 25 , 51
6 , 26 , 61
6 , 27 , 500
etc....
这些表由chemical_id(多对多---- ugh)连接,并且有几千个facility_id,每个facility_id有几百个chemical_id。还有几千个sample_id,每个sample_id有几百个chemical_id。总而言之,tbl_fac中有大约500,000条记录,tbl_samp中有1,000,000条记录。
我正在尝试从此数据集中提取三组sample_id:
第1组:任何sample_id,其中tbl_samp.result> tbl_fac.criteria(即结果超出标准)
第2组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,并且所有tbl_fac.chemical_id都存在于该sample_id中(即结果小于标准,一切都在那里)
第3组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,但是sample_id中缺少一个或多个tbl_fac.chemical_id(即结果小于条件,但缺少某些东西)
以下是问题:如何在一个查询中有效地获取所有三个组?
我试过了:
select *
from tbl_fac
left join tbl_samp
on tbl_fac.chemical_id = tbl_samp.chemical_id
但这只会产生整个数据集(而不是单个样本)缺少的值。我有一个hackish查询工作,使用第三个表来加入tbl_fac和tbl_samp,但它太丑了我真的很尴尬发布它....
一如既往,非常感谢您对此问题的看法!
干杯,
约什
编辑:理想情况下,我希望sample_id和Group返回 - 每个样本ID只有一个组(我对数据的了解表明它们将始终属于上述三个类别之一)。
答案 0 :(得分:1)
此答案假设facility_id
中的chemical_id
和tbl_fac
存在唯一约束,sample_id
和chemical_id
中存在唯一约束{1}}。我所做的是一次一步地构建查询。这是否有效还有待观察。
第1组:任何sample_id,其中tbl_samp.result> tbl_fac.criteria(即结果超出标准)
tbl_samp
第2组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,并且所有tbl_fac.chemical_id都存在于该sample_id中(即结果小于标准,一切都在那里)
SELECT tbl_samp.sample_id,
'ResultsGreaterThanCriteria' AS samplegroup
FROM tbl_fac
INNER JOIN tbl_samp
ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE tbl_samp.result > tbl_fac.criteria
GROUP BY tbl_samp.sample_id
第3组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,但是sample_id中缺少一个或多个tbl_fac.chemical_id(即结果小于条件,但缺少某些东西)
SELECT tbl_samp.sample_id,
'ResultLessThanCriteriaAndAllChems' AS samplegroup
FROM tbl_fac
INNER JOIN tbl_samp
ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE tbl_samp.result < tbl_fac.criteria
AND NOT EXISTS (SELECT *
FROM tbl_fac tf
LEFT JOIN tbl_samp ts
ON tf.chemical_id = ts.chemical_id
WHERE ts.chemical_id IS NULL
AND tbl_samp.sample_id = ts.sample_id)
GROUP BY tbl_samp.sample_id
最后,你将所有三个查询结合在一起得到:
SELECT tbl_samp.sample_id,
'ResultsLessThanCriteriaWithMissingChems' AS samplegroup
FROM tbl_fac
INNER JOIN tbl_samp
ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE tbl_samp.result < tbl_fac.criteria
AND EXISTS (SELECT *
FROM tbl_fac tf
LEFT JOIN tbl_samp ts
ON tf.chemical_id = ts.chemical_id
WHERE ts.chemical_id IS NULL
AND tbl_samp.sample_id = ts.sample_id)
GROUP BY tbl_samp.sample_id
答案 1 :(得分:1)
SELECT
sample_id,
IF(result = criteria, -1, /* unspecified behavior */
IF(result > criteria, 1,
IF(nb_chemicals = total_nb_chemicals, 2, 3))) AS grp
FROM (
SELECT s.result, s.sample_id, f.criteria, f.chemical_id,
COUNT(DISTINCT f.chemical_id) AS nb_chemicals
FROM tbl_fac f JOIN tbl_samp s
ON f.chemical_id = s.chemical_id
GROUP BY s.sample_id
) t
CROSS JOIN (
SELECT COUNT(DISTINCT chemical_id) AS total_nb_chemicals
FROM tbl_fac
) u
新解决方案:
SELECT
s.sample_id,
IF(s.result = f.criteria, -1, /* unspecified behavior */
IF(s.result > f.criteria, 1,
IF(sample_nb_chemicals = total_nb_chemicals, 2, 3))) AS grp
FROM
tbl_fac f JOIN tbl_samp s
ON f.chemical_id = s.chemical_id
JOIN (
SELECT s.sample_id,
COUNT(DISTINCT f.chemical_id) AS sample_nb_chemicals
FROM tbl_fac f JOIN tbl_samp s
ON f.chemical_id = s.chemical_id
GROUP BY s.sample_id
) u
ON s.sample_id = u.sample_id
CROSS JOIN (
SELECT COUNT(DISTINCT chemical_id) AS total_nb_chemicals
FROM tbl_fac
) v
GROUP BY sample_id, grp