我需要运行一个查询,从一个大表中选择两列(3m +行,选择两列,结果集大约为6-7m)并返回一个列表。所以我使用union将列合并到列表中,同时消除重复。问题是我无法在一个查询中返回结果,我需要对其进行分区,因此我将LIMIT ?,?
应用于子查询,应用程序层通过Prepared Statements设置。
SELECT val
FROM
(
(SELECT fs.smr as val
FROM `fr_search` as fs
ORDER BY val LIMIT ?,?)
UNION
(SELECT fs.dmr as val
FROM `fr_search` as fs
ORDER BY val LIMIT ?,?)
) as vals
GROUP BY val
问题:联合消除了重复,但仅在应用LIMIT之后。含义如果两个查询返回100 + 100 = 200行,并且大多数是重复的,我只返回< 200行。如何对这样的查询应用限制,我可以返回特定数量的行? (如果我在子查询后应用LIMIT,则运行时间超过两分钟,因此无法解决问题。)
答案 0 :(得分:2)
您实际上并不需要子查询。以下内容适用于前100行:
(SELECT DISTINCT fs.smr as val
FROM `fr_search` as fs
ORDER BY val
LIMIT 100
)
UNION
(SELECT DISTINCT fs.dmr as val
FROM `fr_search` as fs
ORDER BY val
LIMIT 100
)
ORDER BY val
LIMIT 100;
然而,一旦你开始投入偏移,它会变得更复杂。对于接下来的100行:
(SELECT DISTINCT fs.smr as val
FROM `fr_search` as fs
ORDER BY val
LIMIT 200
)
UNION
(SELECT DISTINCT fs.dmr as val
FROM `fr_search` as fs
ORDER BY val
LIMIT 200
)
ORDER BY val
LIMIT 100, 100;
问题在于你不知道第二组的来源。
如果您确实需要翻阅结果集,我建议您将其存储在临时表中,并将页面存储在临时表中。
答案 1 :(得分:1)
查询优化始终包含两个部分。并且有时是尝试,测量和比较的迭代过程。
最好的查询很可能是直截了当的简单:
SELECT v.val
FROM (
SELECT fs.smr as val
FROM `fr_search` as fs
UNION
SELECT fs.dmr as val
FROM `fr_search` as fs
) as v
ORDER BY v.val LIMIT ?,?;
为了有效运行,您需要2个索引:
fr_search.smr
fr_search.dmr
如果优化器无法处理上述情况,请尝试使用索引提示强制它使用索引。
在最极端的情况下,您可以尝试使用以下方法解决问题:
SELECT v.val
FROM (
SELECT DISTINCT fs.smr as val
FROM `fr_search` as fs
ORDER BY fs.smr LIMIT ?
UNION
SELECT DISTINCT fs.dmr as val
FROM `fr_search` as fs
ORDER BY fs.dmr LIMIT ?
) as v
ORDER BY v.val LIMIT ?,?;
请注意,您的替换(假设页数为100)应如下所示:
Page 1: 100, 100, 100, 0 Page 2: 200, 200, 100, 100 Page 3: 300, 300, 100, 200 Page 4: 400, 400, 100, 300 etc.
原因是,您需要满足有利于任何一个表的交叉列排序的可能不平衡。例如,第4页:
答案 2 :(得分:0)
您有两种选择:
您可以在内部和外部查询中SELECT DISTINCT
:
SELECT DISTINCT val
FROM
(
(SELECT DISTINCT fs.smr as val
FROM `fr_search` as fs)
UNION ALL
(SELECT DISTINCT fs.dmr as val
FROM `fr_search` as fs)
) as vals
ORDER BY val LIMIT ?,?;
或者您也可以按内部查询进行分组,然后再按外部查询进行分组。
SELECT val
FROM
(
(SELECT fs.smr as val
FROM `fr_search` as fs
GROUP BY fs.smr)
UNION ALL
(SELECT fs.dmr as val
FROM `fr_search` as fs
GROUP BY fs.dmr)
) as vals
GROUP BY val
ORDER BY val LIMIT ?,?;
在这个特定场景中,两者都会做同样的事情。但是在两者中你都应该使用union all,这样UNION
部分就不会单独工作,而且你明确知道你的记录分组方式。我还将limit子句移动到外部查询