这让我疯了。我使用imdbpy转储了imdb数据库。我正试图找到电影的第一个字母提供演员数据的美国电影。
以下是在没有acto数据的情况下获取电影的查询示例。这非常快:
SELECT DISTINCT title.id,title.title,title.production_year
FROM title
INNER JOIN movie_info ON
(movie_info.movie_id = title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info = 'USA')
WHERE title LIKE 'a%'
AND title.kind_id = 1
LIMIT 75
演员表数据存储在一个名为cast_info
的单独表格中,包含约2200万条记录。 nr_order
列包含电影中演员的信用顺序。例如,汤姆汉克在阿甘正传中将是1。每个movie_id
通常有几十行。
因此,要检查actor数据是否可用,应该至少有一行对于该特定movie_id
不为null。如果nr_order
中movie_id
的所有值都为空,则它不包含我需要的数据。
尝试获取此信息时使用以下查询:
SELECT DISTINCT title.id,title.title,title.production_year
FROM title
INNER JOIN movie_info ON
(movie_info.movie_id = title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info = 'USA')
INNER JOIN cast_info ON
(cast_info.movie_id = title.id
AND
cast_info.nr_order = 1)
WHERE title LIKE 'a%'
AND title.kind_id = 1
LIMIT 75
由于某种原因,查询变得非常慢。第一个查询需要.3-.7,第二个查询需要大约6-10秒。我在cast_info
添加了一个索引。nr_order
但它没有帮助。
EXPLAIN输出:
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| 1 | SIMPLE | title | range | PRIMARY,title_idx_title,fk_kind_type_id_4 | title_idx_title | 257 | NULL | 132801| Using where; Using temporary|
| 1 | SIMPLE | movie_info| ref | ovie_info_idx_mid,info_type_id movie_info_idx_mid| movie_info_idx_mid| 4 | imdb.title.id| 4 | Using where; Distinct |
| 1 | SIMPLE | table1 | ref | cast_info_idx_mid,nr_order | cast_info_idx_mid | 4 | imdb.title.id| 12 | Using where; Distinct |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
任何想法都会非常有用!
编辑:从第一次查询中解释
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| 1 | SIMPLE | title | range | PRIMARY,title_idx_title,fk_kind_type_id_4 | title_idx_title | 257 | NULL | 132801| Using where; Using temporary|
| 1 | SIMPLE | movie_info| ref | ovie_info_idx_mid,info_type_id movie_info_idx_mid| movie_info_idx_mid| 4 | imdb.title.id| 4 | Using where; Distinct |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
答案 0 :(得分:1)
由于您只关心 或是否投放信息可用,您可以尝试使用EXISTS
代替:
SELECT DISTINCT title.id,title.title,title.production_year
FROM title
INNER JOIN movie_info ON
(movie_info.movie_id = title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info = 'USA')
WHERE title LIKE 'a%'
AND title.kind_id = 1
AND EXISTS(SELECT 1 FROM cast_info WHERE cast_info.movie_id = title.id AND cast_info.nr_order IS NOT NULL)
LIMIT 75
我不确定你的行为的确切解释,但是DISTINCT
可能会做一些有趣的事情,连接上有很多行 - 或者至少在连接的产品上有很多行 - (注意区别)被应用于解释中的cast_info表。