有没有更有效的方法来编写此查询?

时间:2013-12-23 20:32:59

标签: mysql sql group-by

好想象以下数据库结构

USERS:
    id    |    name    |    company_id
     1         John            1
     2         Jane            1
     3         Jack            2
     4         Jill            3

COMPANIES:

    id    |    name
     1         CompanyA
     2         CompanyB
     3         CompanyC
     4         CompanyD

首先,我想选择所有拥有多个用户的公司

SELECT
      `c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1

够容易。现在我想要选择属于具有多个用户的公司的所有用户。我有这个综合查询,但我认为这不是很高效

SELECT * FROM `users` WHERE `company_id` = (
   SELECT
      `c`.`id`
   FROM `companies` AS `c`
   LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
   GROUP BY `c`.`id`
   HAVING COUNT(`u`.`id`) > 1
)

基本上我从第一个查询返回的id(拥有1个以上用户的公司)然后查询users表以查找该公司的所有用户。

5 个答案:

答案 0 :(得分:0)

为什么不

SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1

根据您说需要返回的数据,您根本不需要公司表中的任何信息。 “现在我想要选择属于拥有多个用户的公司的所有用户。”

答案 1 :(得分:0)

试试这个:

SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1

答案 2 :(得分:0)

仅获取用户的最简单方法可能是保留子查询但消除连接;因为它不是相关的子查询,所以它应该相当有效(显然,company_id上的索引对此有帮助);

SELECT u.* FROM USERS u WHERE company_id IN (
  SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);

可以例如将其重写为LEFT JOIN,但我怀疑它实际上效率较低,因为您在使用时很可能需要使用DISTINCT一个JOIN;

SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
  ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;

An SQLfiddle to test both

答案 3 :(得分:0)

尝试半连接查询:

SELECT *
FROM users u
WHERE EXISTS (
  SELECT null FROM users u1
  WHERE u.company_id=u1.company_id
    AND u.id <> u1.id
)

演示 - &gt; http://www.sqlfiddle.com/#!2/12dc34/2

假设id是主键列,在company_id列上创建索引会提高性能。

如果您真的沉迷于此查询的性能,请在列company_id + id:

上创建一个复合索引
CREATE INDEX very_fast ON users( company_id, id );

答案 4 :(得分:0)

你可以尝试一下吗?

SELECT users.*
FROM users INNER JOIN
(
    SELECT company_id
    FROM users
    GROUP BY company_id
    HAVING COUNT(*) > 1
) x USING(company_id);

您应该有一个索引INDEX(company_id)

性能测试

我已经在答案中测试了3个查询。

  1. Q1 =子查询(使用GROUP BY)和INNER JOIN
  2. Q2 = LEFT JOIN且IS NOT NULL
  3. Q3 = EXISTS
  4. 所有查询返回相同的结果。用TPC-H lineitem表进行测试。问题是“找到lineitem有多个项目”

    测试结果

    这取决于你想要的是检索第一行或整行。

    • Q1(获得第1个10K行):2.85秒
    • Q2(获得第1个10K行):0.03秒
    • Q3(获得第1个10K行):0.03秒

    • Q1(获取所有行):8.19秒

    • Q2(获取所有行):34.12秒
    • Q3(获取所有行):29.54秒

    架构和数据

    mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
    *************************** 1. row ***************************
    COUNT(*): 11997996
    1 row in set (1.68 sec)
    
    mysql> SHOW CREATE TABLE lineitem\G
    *************************** 1. row ***************************
           Table: lineitem
    Create Table: CREATE TABLE `lineitem` (
      `l_orderkey` int(11) NOT NULL,
      `l_partkey` int(11) NOT NULL,
      `l_suppkey` int(11) NOT NULL,
      `l_linenumber` int(11) NOT NULL,
      `l_quantity` decimal(15,2) NOT NULL,
      `l_extendedprice` decimal(15,2) NOT NULL,
      `l_discount` decimal(15,2) NOT NULL,
      `l_tax` decimal(15,2) NOT NULL,
      `l_returnflag` char(1) NOT NULL,
      `l_linestatus` char(1) NOT NULL,
      `l_shipDATE` date NOT NULL,
      `l_commitDATE` date NOT NULL,
      `l_receiptDATE` date NOT NULL,
      `l_shipinstruct` char(25) NOT NULL,
      `l_shipmode` char(10) NOT NULL,
      `l_comment` varchar(44) NOT NULL,
      PRIMARY KEY (`l_orderkey`,`l_linenumber`),
      KEY `l_orderkey` (`l_orderkey`),
      KEY `l_partkey` (`l_partkey`,`l_suppkey`),
      CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
      CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    

    查询

    Q1 FIRST 10K

    SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
    FROM lineitem u INNER JOIN
      (
        SELECT  l_orderkey
        FROM lineitem
        GROUP BY l_orderkey
        HAVING COUNT(*) > 1
      ) x USING (l_orderkey)
    LIMIT 10000;
    

    Q2第一个10K

    SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
    FROM lineitem u
    LEFT JOIN lineitem u2
      ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
    WHERE u2.l_linenumber IS NOT NULL
    LIMIT 10000;
    

    Q3 FIRST 10K

    SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
    FROM lineitem u
    WHERE EXISTS (
      SELECT null FROM lineitem u1
      WHERE u.l_orderkey=u1.l_orderkey
        AND u.l_linenumber <> u1.l_linenumber
    )
    LIMIT 10000;
    

    检索整行

    Q1 ALL

    SELECT SQL_NO_CACHE COUNT(*)
    FROM lineitem u INNER JOIN
      (
        SELECT  l_orderkey
        FROM lineitem
        GROUP BY l_orderkey
        HAVING COUNT(*) > 1
      ) x USING (l_orderkey);
    

    Q2 ALL

    SELECT SQL_NO_CACHE COUNT(*)
    FROM lineitem u
    LEFT JOIN lineitem u2
      ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
    WHERE u2.l_linenumber IS NOT NULL;
    

    Q3 ALL

    SELECT SQL_NO_CACHE COUNT(*)
    FROM lineitem u
    WHERE EXISTS (
      SELECT null FROM lineitem u1
      WHERE u.l_orderkey=u1.l_orderkey
        AND u.l_linenumber <> u1.l_linenumber
    );