Question

概述：

我有三个表1）订阅者，bios和衬衫，我需要找到没有生物或衬衫的订阅者

这些表的布局如

订户

| season_id |  user_id |

生物

| bio_id | user_id |

衬衫尺寸

| bio_id | shirtsize |

我需要找到所有没有生物或衬衫尺码的用户（如果没有生物;那么通过关系没有衬衫尺寸）。

我最初写了一个像：

这样的查询

SELECT *
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = subscribers.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

但现在需要10秒钟才能完成。

我想知道如何重构查询（或可能是问题），以便合理地进行预编码。

这是mysql解释:( ogu = subscriber，b = bio，tn = shirtshize）

| id | select_type | table | type  | possible_keys | key     | key_len | ref         | rows   | Extra       |   
+----+-------------+-------+-------+---------------+---------+---------+-------------+--------+-------------+    
|  1 | SIMPLE      | ogu   | ref   | PRIMARY       | PRIMARY | 4       | const       |    133 | Using where |
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 8       | NULL        | 187644 | Using index |
|  1 | SIMPLE      | tn    | ref   | nid           | nid     | 4       | waka2.b.nid |      1 | Using where |

以上是相当消毒的，这是真实的信息：

mysql> DESCRIBE subscribers
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| subscribers  | int(11) | NO   | PRI |         |       | 
| uid       | int(11) | NO   | PRI |         |       | 


mysql> DESCRIBE bio;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| bio_id   | int(10) unsigned | NO   | PRI | 0       |       | 
| uid   | int(10) unsigned | NO   | PRI | 0       |       | 


mysql> DESCRIBE shirtsize;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| bio_id   | int(10) unsigned | NO   | PRI | 0       |       | 
| shirtsize   | int(10) unsigned | NO   | PRI | 0       |       |

，真正的查询如下：

SELECT ogu.nid, ogu.is_active, ogu.uid, b.nid AS bio_node, tn.nid AS size
                  FROM og_uid ogu
                  LEFT JOIN bio b ON b.uid = ogu.uid
                  LEFT JOIN term_node tn ON tn.nid = b.nid
                  WHERE ogu.nid = 185033 AND ogu.is_admin = 0
                  AND (b.nid IS NULL OR tn.tid IS NULL)

nid是season_id或bio_id（带有类型）; term_node将成为衬衫大小

Answer 1

查询应该没问题。我会通过查询分析器运行它并优化表上的索引。

Answer 2

联接是您可以对SQL查询执行的最昂贵的操作之一。虽然它应该能够在某种程度上自动优化您的查询，但也可以尝试重组它。首先，我将代替SELECT *，确保指定您需要哪些列来自哪些关系。这会加快速度。

如果您只需要用户ID，例如：

SELECT s.user_id
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = subscribers.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

这将允许SQL数据库自己重新调整查询效率。

Answer 3

显然我没有检查这个，但似乎你想要的是选择任何没有匹配的生物或者bios和衬衫之间的连接失败的订户。我会考虑使用NOT EXISTS来解决这个问题。你可能想要bio.user_id和shirtsizes.bio_id上的索引。

select *
from subscribers
where s.season_id = 185181
      and not exists (select *
                      from bio join shirtsizes on bio.bio_id = shirtsizes.bio_id
                      where bio.user_id = subscribers.user_id)

修改：

根据您的更新，您可能希望在每列上创建单独的键，而不是/除了具有复合主键之外。连接可能无法充分利用复合主索引，并且连接列本身的索引可能会加快速度。

Answer 4

如果你确切地定义了你正在寻找的东西而不是SELECT *它可能会加速它...也是OR不是最快的查询，如果你可以重新写它没有OR它将是更快。

另外......你可以尝试工会而不是左连接吗？

SELECT s.user_id
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = s.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

会是这样的：

(SELECT s.user_id FROM subscribers s WHERE s.season_id = 185181)
UNION
(SELECT b.user_id, b.bio_id FROM bio b WHERE bio.bio_id IS NULL)
UNION
(SELECT shirtsizes.bio_id FROM shirtsizes WHERE shirtsizes.size is NULL)

（说实话，对我来说看起来不对......但是我从不使用 ~~join或~~加入语法或联合......）

我愿意：

SELECT *
FROM subscribers s, bio b, shirtsizes sh
WHERE s.season_id = 185181
AND shirtsize.bio_id = bio.bio_id 
AND b.user_id = s.user_id 
AND (bio.bio_id IS NULL 
     OR 
     shirtsize.size IS NULL);

Answer 5

bio_id是BIOS的主键吗？是否真的有可能存在b.user_id = subscribers.user_id但b.bio_id为空的bios行？

是否有shirtsize.bio_id NULL的衬衫大小行？这些行是否有衬衫尺。不是NULL？

Answer 6

相关赛季的订阅者名单与本赛季的订阅者名单以及bios和衬衫尺寸之间的差异会更快吗？

SELECT *
   FROM Subscribers
   WHERE season_id = 185181
     AND user_id NOT IN
         (SELECT DISTINCT s.user_id
             FROM subscribers s
             JOIN bios b ON s.user_id = b.user_id
             JOIN shirtsizes z ON b.bio_id = z.bio_id
             WHERE s.season_id = 185181
         )

这避免了外连接，它不像内连接那么快，因此可能更快。另一方面，它可能会创建两个大型列表，它们之间的差异很小。目前尚不清楚子查询中的DISTINCT是否会改善或损害性能。它意味着排序操作（昂贵），但如果MySQL优化器支持这样的事情，则为合并连接铺平了道路。

可能还有其他符号可用 - 例如，MINUS或DIFFERENCE。

Answer 7

现在，您的查询会评估所有bio和term_node的查询（如果存在），然后将其过滤掉。

但你想要的只是找到没有og_uid的{{1}}（没有term_node也意味着没有bio）< / p>

因此，您只想在找到第一个term_node后立即停止评估bio和term_node：

term_node

这将为每个SELECT * FROM ( SELECT ogu.nid, ogu.is_active, ogu.uid, ( SELECT 1 FROM bio b, term_node tn WHERE b.uid = ogu.uid AND tn.nid = b.nid LIMIT 1 ) AS ex FROM og_uid ogu WHERE ogu.nid = 185033 AND ogu.is_admin = 0 ) ogu1 WHERE ex IS NULL评估最多一个bio和最多一个term_node，而不是评估所有现有数千并过滤掉它们。

应该更快地工作。

Answer 8

select * from subscribers where user_id not in (
  select user_id from bio where bio_id not in (
    select bio_id from shirt_sizes
  )
) and season_id=185181

Answer 9

我认为你的“大桌子”是订阅者，而且season_id可能既没有选择性也没有索引（无论如何，如果它没有选择性，那么索引它就没有意义了），这意味着你必须完全扫描用户，无论如何。分手，我会加入（使用内部联接）另外两个表 - 请注意，如果在shirt_size中没有bio_id，那么对于您的查询来说就像没有生物一样完全相同。第一位：

select uid
from bio
     inner join shirtsizes
             on shirtsizes.bio_id = bio.bio_id

此时您要检查衬衫尺寸是否在bio_id上编入索引。现在，您可以将此查询留给外部加入订阅者：

select *
from subscribers s
     left outer join (select uid
                      from bio
                      inner join shirtsizes
                              on shirtsizes.bio_id = bio.bio_id) x
                  on x.uid = s.uid
where s.season_id = 185181
  and x.uid is null

如果生物和衬衫都不是巨大的，那么它的运行速度可能相当快......

左联盟是我想要的，但他们很慢？

9 个答案: