MySQL>,<,并且缺少组

时间:2012-03-06 05:22:01

标签: mysql group-by missing-data

我在MySQL中有两个表,我正在与以下属性进行比较:

tbl_fac : facility_id, chemical_id, criteria
             10      , 25         , 50
             10      , 26         , 60
             10      , 27         , 60
             11      , 25         , 30
             11      , 27         , 31 
              etc...

tbl_samp: sample_id, chemical_id, result
            5     ,    25         , 51
            5     ,    26         , 61
            6     ,    25         , 51
            6     ,    26         , 61
            6     ,    27         , 500

              etc.... 

这些表由chemical_id(多对多---- ugh)连接,并且有几千个facility_id,每个facility_id有几百个chemical_id。还有几千个sample_id,每个sample_id有几百个chemical_id。总而言之,tbl_fac中有大约500,000条记录,tbl_samp中有1,000,000条记录。

我正在尝试从此数据集中提取三组sample_id:

第1组:任何sample_id,其中tbl_samp.result> tbl_fac.criteria(即结果超出标准)

第2组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,并且所有tbl_fac.chemical_id都存在于该sample_id中(即结果小于标准,一切都在那里)

第3组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,但是sample_id中缺少一个或多个tbl_fac.chemical_id(即结果小于条件,但缺少某些东西)

以下是问题:如何在一个查询中有效地获取所有三个组?

我试过了:

select * 
from tbl_fac 
left join tbl_samp 
    on tbl_fac.chemical_id = tbl_samp.chemical_id

但这只会产生整个数据集(而不是单个样本)缺少的值。我有一个hackish查询工作,使用第三个表来加入tbl_fac和tbl_samp,但它太丑了我真的很尴尬发布它....

一如既往,非常感谢您对此问题的看法!

干杯,

约什

编辑:理想情况下,我希望sample_id和Group返回 - 每个样本ID只有一个组(我对数据的了解表明它们将始终属于上述三个类别之一)。

2 个答案:

答案 0 :(得分:1)

此答案假设facility_id中的chemical_idtbl_fac存在唯一约束,sample_idchemical_id中存在唯一约束{1}}。我所做的是一次一步地构建查询。这是否有效还有待观察。

第1组:任何sample_id,其中tbl_samp.result> tbl_fac.criteria(即结果超出标准)

tbl_samp

第2组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,并且所有tbl_fac.chemical_id都存在于该sample_id中(即结果小于标准,一切都在那里)

SELECT tbl_samp.sample_id,
       'ResultsGreaterThanCriteria' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result > tbl_fac.criteria
GROUP  BY tbl_samp.sample_id

第3组:任何sample_id,其中tbl_samp.result< tbl_fac.criteria,但是sample_id中缺少一个或多个tbl_fac.chemical_id(即结果小于条件,但缺少某些东西)

SELECT tbl_samp.sample_id,
       'ResultLessThanCriteriaAndAllChems' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result < tbl_fac.criteria
       AND NOT EXISTS (SELECT *
                       FROM   tbl_fac tf
                              LEFT JOIN tbl_samp ts
                                ON tf.chemical_id = ts.chemical_id
                       WHERE  ts.chemical_id IS NULL
                              AND tbl_samp.sample_id = ts.sample_id)
GROUP  BY tbl_samp.sample_id

最后,你将所有三个查询结合在一起得到:

SELECT tbl_samp.sample_id,
       'ResultsLessThanCriteriaWithMissingChems' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result < tbl_fac.criteria
       AND EXISTS (SELECT *
                   FROM   tbl_fac tf
                          LEFT JOIN tbl_samp ts
                            ON tf.chemical_id = ts.chemical_id
                   WHERE  ts.chemical_id IS NULL
                          AND tbl_samp.sample_id = ts.sample_id)
GROUP  BY tbl_samp.sample_id 

答案 1 :(得分:1)

SELECT
    sample_id,
    IF(result = criteria, -1,  /* unspecified behavior */
     IF(result > criteria, 1,
      IF(nb_chemicals = total_nb_chemicals, 2, 3))) AS grp

FROM (
    SELECT s.result, s.sample_id, f.criteria, f.chemical_id,
        COUNT(DISTINCT f.chemical_id) AS nb_chemicals
    FROM tbl_fac f JOIN tbl_samp s
        ON f.chemical_id = s.chemical_id
    GROUP BY s.sample_id
) t 

CROSS JOIN (
    SELECT COUNT(DISTINCT chemical_id) AS total_nb_chemicals
    FROM tbl_fac
) u

新解决方案:

SELECT
    s.sample_id,
    IF(s.result = f.criteria, -1,  /* unspecified behavior */
     IF(s.result > f.criteria, 1,
      IF(sample_nb_chemicals = total_nb_chemicals, 2, 3))) AS grp

FROM
    tbl_fac f JOIN tbl_samp s
    ON f.chemical_id = s.chemical_id

    JOIN (
        SELECT s.sample_id, 
               COUNT(DISTINCT f.chemical_id) AS sample_nb_chemicals
        FROM tbl_fac f JOIN tbl_samp s
             ON f.chemical_id = s.chemical_id
        GROUP BY s.sample_id
    ) u
       ON s.sample_id = u.sample_id

    CROSS JOIN (
        SELECT COUNT(DISTINCT chemical_id) AS total_nb_chemicals
        FROM tbl_fac
    ) v

GROUP BY sample_id, grp