Question

请帮我优化查询。它在我的样本表上运行得太慢（由于行数太多），并且在我的社交网络项目中在线使用似乎不可行。

背景/动机：

我需要估计，我有几百万个配置文件。因此，我填写了一个样本表tbl_profile 10 mio。具有随机数据的用户。
每个用户都可以关注几个兴趣点。这是用户（tbl_profile）和兴趣（tbl_interest）之间的多对多关系，通过表tbl_interest_subscription映射。

这些是表格：

首先，我的个人资料表 tbl_profile （10.499.307行）：

id            mediumint(8)
gender_id     tinyint(3)
weight        tinyint(3)
height        tinyint(3)
num_children  tinyint(3)

现在，兴趣表 tbl_interest （30个样本行）：

id            smallint(5)
owner_id      mediumint(8)
name          varchar(25)

这里，用户和兴趣之间的映射表， tbl_interest_subscription （262.482.675样本行）：

id            int(10)
profile_id    mediumint(8)
interest_id   smallint(5)

该表还有其他列（例如创建的时间戳等），我没有显示，因为它们只会使概述膨胀。如果他们感兴趣，我当然可以展示完整的表格。

我的查询具有以下目标：找到符合我的偏好的用户（身高，体重，孩子数量......）并分享我的兴趣。并根据他们与我共同的兴趣数量对结果进行排序。我的查询如下：

SELECT map.profile_id, COUNT(map.interest_id) as num_matches_interests
FROM tbl_interest_subscription map

INNER JOIN
(
SELECT id
FROM tbl_profile
WHERE weight>=60 AND weight<=90
      AND height>=120 AND height<=190
      AND num_children=1
) AS profile ON profile.id=map.profile_id

WHERE map.interest_id IN (24, 25, 26, 27, 28)
GROUP BY map.profile_id
ORDER BY num_matches_interests DESC
LIMIT 20

来自EXPLAIN调用的结果如下：

id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2422568
Extra: Using temporary; Using filesort
***************************************
id: 2
select_type: PRIMARY
table: map
type: ref
possible_keys: idx_profileID_interestID (unique), idx_profileID, idx_interestID
key: idx_profileID_interestID
key_len: 3
ref: profile.id
rows: 12
Extra: Using where; Using index
***************************************
id: 3
select_type: DERIVED
table: tbl_profile
type: range
possible_keys: idx_weight_height_children
key: idx_weight_height_children
key_len: 10
ref: NULL
rows: 5249710
Extra: Using where; Using index

查询本身在云服务器@Digital Ocean上运行了很长时间（几分钟），内存为2GB，2个CPU（每个2.4 GHz）。

我认为临时表和filesort是主要问题，因为我已经在使用索引了。然而，我不知道如何避免这种情况，因为我需要对已计算的利息份额进行排序，并且无法对已经给定的索引进行排序。在我寻找优化策略时，我总是阅读有关分区的内容，但我不知道如何有利地划分我的兴趣映射表。

谢谢！

编辑：

要求运行

EXPLAIN EXTENDED SELECT map.profile_id, map.num_matches_interests 
FROM tbl_profile AS profile

INNER JOIN 
(
   SELECT profile_id, COUNT(interest_id) AS num_matches_interests 
   FROM tbl_interest_subscription 
   WHERE interest_id IN ( 24, 25, 26, 27, 28 ) 
   GROUP BY profile_id) AS map ON profile.id=map.profile_id 

WHERE profile.weight BETWEEN 60 AND 90 
AND profile.height BETWEEN 120 AND 190 
AND profile.num_children=1

这是结果：

id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 10.499.307
filtered: 100.00
Extra:
******************
id: 2
select_type: PRIMARY
table: profile
type: eq_ref
possible_keys: PRIMARY, idx_id_weight_height, idx_weight_height_numchildren
key: PRIMARY
key_len: 3
ref: map.profile_id
rows: 1
filtered: 100.00
Extra: Using where
******************
id: 3
select_type: DERIVED
table: tbl_interest_subscription
type: index
possible_keys: interest_id
key: idx_profile_interest
key_len: 5
ref: NULL
rows: 262.483.161
filtered: 36.75
Extra: Using where; Using index

EXPLAIN查询运行了大约2-3分钟。

在非常大的表（> 10 mio。行）上计算多对多关系的MySQL查询

0 个答案: