如果值存在,如何检查另一行?

时间:2012-10-26 17:29:44

标签: mysql sql

这让我疯了。我使用imdbpy转储了imdb数据库。我正试图找到电影的第一个字母提供演员数据的美国电影。

以下是在没有acto数据的情况下获取电影的查询示例。这非常快:

SELECT DISTINCT title.id,title.title,title.production_year
FROM  title

INNER JOIN movie_info ON
(movie_info.movie_id =  title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info =  'USA') 

WHERE  title LIKE  'a%'
AND  title.kind_id =  1
LIMIT 75

演员表数据存储在一个名为cast_info的单独表格中,包含约2200万条记录。 nr_order列包含电影中演员的信用顺序。例如,汤姆汉克在阿甘正传中将是1。每个movie_id通常有几十行。

因此,要检查actor数据是否可用,应该至少有一行对于该特定movie_id不为null。如果nr_ordermovie_id的所有值都为空,则它不包含我需要的数据。

尝试获取此信息时使用以下查询:

SELECT DISTINCT title.id,title.title,title.production_year
FROM  title

INNER JOIN movie_info ON
(movie_info.movie_id =  title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info =  'USA') 

INNER JOIN cast_info ON 
(cast_info.movie_id = title.id
AND
cast_info.nr_order = 1)

WHERE  title LIKE  'a%'
AND  title.kind_id =  1
LIMIT 75

由于某种原因,查询变得非常慢。第一个查询需要.3-.7,第二个查询需要大约6-10秒。我在cast_info添加了一个索引。nr_order但它没有帮助。

EXPLAIN输出:

+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| id | select_type | table     | type  | possible_keys                                    | key               | key_len | ref          | rows  | Extra                       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
|  1 | SIMPLE      | title     | range | PRIMARY,title_idx_title,fk_kind_type_id_4        |  title_idx_title  | 257     | NULL         | 132801| Using where; Using temporary|
|  1 | SIMPLE      | movie_info| ref   | ovie_info_idx_mid,info_type_id movie_info_idx_mid| movie_info_idx_mid| 4       | imdb.title.id| 4     | Using where; Distinct       |
|  1 | SIMPLE      | table1    | ref   | cast_info_idx_mid,nr_order                       | cast_info_idx_mid | 4       | imdb.title.id| 12    | Using where; Distinct       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+

任何想法都会非常有用!

编辑:从第一次查询中解释

+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| id | select_type | table     | type  | possible_keys                                    | key               | key_len | ref          | rows  | Extra                       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
|  1 | SIMPLE      | title     | range | PRIMARY,title_idx_title,fk_kind_type_id_4        |  title_idx_title  | 257     | NULL         | 132801| Using where; Using temporary|
|  1 | SIMPLE      | movie_info| ref   | ovie_info_idx_mid,info_type_id movie_info_idx_mid| movie_info_idx_mid| 4       | imdb.title.id| 4     | Using where; Distinct       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+

1 个答案:

答案 0 :(得分:1)

由于您只关心 是否投放信息可用,您可以尝试使用EXISTS代替:

SELECT DISTINCT title.id,title.title,title.production_year
FROM  title

INNER JOIN movie_info ON
(movie_info.movie_id =  title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info =  'USA') 

WHERE  title LIKE  'a%'
AND  title.kind_id =  1
AND EXISTS(SELECT 1 FROM cast_info WHERE cast_info.movie_id = title.id AND cast_info.nr_order IS NOT NULL)
LIMIT 75

我不确定你的行为的确切解释,但是DISTINCT可能会做一些有趣的事情,连接上有很多行 - 或者至少在连接的产品上有很多行 - (注意区别)被应用于解释中的cast_info表。