选择具有特定属性的分组行

时间:2016-09-26 15:21:18

标签: sql postgresql

我试图只选择包含某个属性的某些行。以下是我正在使用的数据示例:

src_id                                              cand_source
------                                              -----------
201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ  mcp
201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mc2
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mc2
201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs  mc2
201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs  mc2
201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0  mc2
201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0  mc2
201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc  mc2
201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc  mc2
201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4  mc2
201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mcp
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM  mc2
201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM  mcp

我想要做的只是选择至少有src_id等于cand_source的{​​{1}}。以下是我尝试过的内容:

mcp

然而,这会让我回复SELECT * FROM schema.table WHERE src_id IN ( SELECT src_id FROM schema.table WHERE batch_id = ? GROUP BY src_id HAVING count(cand_source = 'mcp') > 1 ) ORDER BY src_id, match_score DESC 个没有src_id等于cand_source的群集。

有人指出,我只是过分复杂的事情。这是解决方案:

mcp

2 个答案:

答案 0 :(得分:1)

如果您只想要源ID,那么您的子查询就是您所需要的。但是,您想要计算匹配值的数量。这是详细的逻辑:

SELECT src_id
FROM schema.table
WHERE batch_id = ?
GROUP BY src_id
HAVING SUM(case when cand_source = 'mcp' then 1 else 0 end) > 1

更简洁的版本是:

HAVING SUM(cand_source = 'mcp'::int) > 1

答案 1 :(得分:1)

如果你只是想要拥有mcp的src_id,那么带有WHERE子句的直接查询就足够了,不需要条件聚合或任何东西。

SELECT DISTINCT 
    src_id
FROM
    Table
WHERE
    cand_source = 'mcp'
    AND batch_id = ?

如果您希望每个src_id的所有记录至少有一个cand_source,那么您可以将其加入表中以接收所有记录。

SELECT t.*
FROM
    Table t
INNER JOIN 
    (SELECT DISTINCT src_id
     FROM Table
     WHERE cand_source = 'mcp'
       AND batch_id = ? ) d ON t.src_id = d.src_id
                            AND t.batch_id = ?

或者你可以使用Common Table Expression和令人敬畏的窗口函数来完成它。

WITH cte AS 
(
    SELECT *, COUNT(CASE WHEN cand_source = 'mcp' THEN cand_source END) OVER (PARTITION BY src_id) as McpCount
    FROM
       Table
    WHERE
       batch_id = ?

)
SELECT *
FROM
    cte
WHERE
    McpCount > 0;