我想优化我的查询,以便查看mysql-slow.log
。
我的大部分慢查询都包含ORDER BY RAND()
。我无法找到解决此问题的真正解决方案。 Theres是MySQLPerformanceBlog的可能解决方案,但我认为这还不够。在未经优化(或经常更新,用户管理)的表上,它不起作用,或者我需要运行两个或更多查询才能选择PHP
- 生成的随机行。
这个问题有解决方法吗?
一个虚拟的例子:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
ORDER BY
RAND()
LIMIT 1
答案 0 :(得分:67)
试试这个:
SELECT *
FROM (
SELECT @cnt := COUNT(*) + 1,
@lim := 10
FROM t_random
) vars
STRAIGHT_JOIN
(
SELECT r.*,
@lim := @lim - 1
FROM t_random r
WHERE (@cnt := @cnt - 1)
AND RAND(20090301) < @lim / @cnt
) i
这在MyISAM
上特别有效(因为COUNT(*)
是即时的),但即使在InnoDB
10
,效率也比ORDER BY RAND()
高出running probability
。< / p>
这里的主要思想是我们不进行排序,而是保留两个变量并计算当前步骤中要选择的行的SELECT aco.*
FROM (
SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid
FROM (
SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid
FROM accomodation
) q
) q2
JOIN accomodation aco
ON aco.ac_id =
COALESCE
(
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_id > randid
AND ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
),
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
)
)
。
有关详细信息,请参阅我的博客中的这篇文章:
<强>更新强>
如果您需要选择一个随机记录,请尝试以下方法:
ac_id
这假定您的{{1}}或多或少均匀分布。
答案 1 :(得分:13)
这取决于你需要的随机性。您链接的解决方案非常适合IMO。除非你在ID字段中有很大的空白,否则它仍然是随机的。
但是,你应该能够在一个查询中使用它(用于选择单个值):
SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1
其他解决方案:
random
的永久浮点字段,并用随机数填充它。然后,您可以在PHP中生成一个随机数并执行"SELECT ... WHERE rnd > $random"
答案 2 :(得分:1)
我是这样做的:
SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != 'draft'
AND c.acat_slug != 'vendeglatohely'
AND a.ac_images != 'b:0;';
SET @sql := CONCAT('
SELECT a.ac_id,
a.ac_status,
a.ac_name,
a.ac_status,
a.ac_images
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != ''draft''
AND c.acat_slug != ''vendeglatohely''
AND a.ac_images != ''b:0;''
LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
答案 3 :(得分:0)
这将为您提供单个子查询,该查询将使用索引获取随机ID,然后另一个查询将触发获取您的连接表。
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)
答案 4 :(得分:0)
虚拟示例的解决方案是:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation,
JOIN
accomodation_category
ON accomodation.ac_category = accomodation_category.acat_id
JOIN
(
SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
) AS Choices
USING (ac_id)
WHERE accomodation.ac_id >= Choices.ac_id
AND accomodation.ac_status != 'draft'
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
LIMIT 1
要详细了解ORDER BY RAND()
的替代方法,请阅读this article。
答案 5 :(得分:0)
我正在优化项目中的许多现有查询。 Quassnoi的解决方案帮助我加快了查询速度!但是,我发现在所有查询中都很难将所述解决方案合并,特别是对于涉及多个大型表上的许多子查询的复杂查询。
所以我使用的是优化程度较低的解决方案。从根本上说,它的工作方式与Quassnoi的解决方案相同。
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size
$size * $factor / [accomodation_table_row_count]
计算出挑选随机行的概率。 rand()将生成一个随机数。如果rand()小于或等于概率,则将选择该行。这有效地执行随机选择以限制表格大小。由于它有可能返回小于定义的限制计数,我们需要增加概率以确保我们选择足够的行。因此,我们将$ size乘以$ factor(我通常设置$ factor = 2,在大多数情况下都适用)。最后,我们执行limit $size
现在的问题是 accomodation_table_row_count 。 如果我们知道表大小,我们可能会硬编码表大小。这将运行得最快,但显然这并不理想。如果您使用Myisam,获取表计数非常有效。由于我使用的是innodb,我只是做一个简单的计数+选择。在你的情况下,它看起来像这样:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size
棘手的部分是找出正确的概率。正如您所看到的,以下代码实际上只计算粗糙的临时表大小(实际上,太粗糙了!):(select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category))
但您可以优化此逻辑以提供更接近的表大小近似值。 请注意,OVER-select最好不要选择不足的行。即如果概率设置得太低,则可能无法选择足够的行。
这个解决方案比Quassnoi的解决方案运行得慢,因为我们需要重新计算表大小。但是,我发现这种编码更易于管理。这是准确性+性能与编码复杂性之间的权衡。话虽如此,在大型桌子上,这仍然远远快于Order by Rand()。
注意:如果查询逻辑允许,请在任何连接操作之前尽早执行随机选择。
答案 6 :(得分:0)
(是的,我会因为这里没有足够的肉而得到帮助,但是你有一天不能成为素食主义者吗?)
案例:连续的AUTO_INCREMENT没有间隙,1行返回
案例:连续AUTO_INCREMENT无间隙,10行
案例:AUTO_INCREMENT有间隙,1行返回
案例:用于随机化的额外FLOAT列
案例:UUID或MD5专栏
对于大型桌子,这5个案例可以非常有效。有关详细信息,请参阅my blog。
答案 7 :(得分:-1)
function getRandomRow(){
$id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
$res = getRowById($id);
if(!empty($res))
return $res;
return getRandomRow();
}
//rowid is a key on table
function getRowById($rowid=false){
return db select from table where rowid = $rowid;
}