如何优化MySQL的ORDER BY RAND()函数?

时间:2009-08-07 12:55:24

标签: mysql random performance

我想优化我的查询,以便查看mysql-slow.log

我的大部分慢查询都包含ORDER BY RAND()。我无法找到解决此问题的真正解决方案。 Theres是MySQLPerformanceBlog的可能解决方案,但我认为这还不够。在未经优化(或经常更新,用户管理)的表上,它不起作用,或者我需要运行两个或更多查询才能选择PHP - 生成的随机行。

这个问题有解决方法吗?

一个虚拟的例子:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
ORDER BY
        RAND()
LIMIT 1

8 个答案:

答案 0 :(得分:67)

试试这个:

SELECT  *
FROM    (
        SELECT  @cnt := COUNT(*) + 1,
                @lim := 10
        FROM    t_random
        ) vars
STRAIGHT_JOIN
        (
        SELECT  r.*,
                @lim := @lim - 1
        FROM    t_random r
        WHERE   (@cnt := @cnt - 1)
                AND RAND(20090301) < @lim / @cnt
        ) i

这在MyISAM上特别有效(因为COUNT(*)是即时的),但即使在InnoDB 10,效率也比ORDER BY RAND()高出running probability。< / p>

这里的主要思想是我们不进行排序,而是保留两个变量并计算当前步骤中要选择的行的SELECT aco.* FROM ( SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid FROM ( SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid FROM accomodation ) q ) q2 JOIN accomodation aco ON aco.ac_id = COALESCE ( ( SELECT accomodation.ac_id FROM accomodation WHERE ac_id > randid AND ac_status != 'draft' AND ac_images != 'b:0;' AND NOT EXISTS ( SELECT NULL FROM accomodation_category WHERE acat_id = ac_category AND acat_slug = 'vendeglatohely' ) ORDER BY ac_id LIMIT 1 ), ( SELECT accomodation.ac_id FROM accomodation WHERE ac_status != 'draft' AND ac_images != 'b:0;' AND NOT EXISTS ( SELECT NULL FROM accomodation_category WHERE acat_id = ac_category AND acat_slug = 'vendeglatohely' ) ORDER BY ac_id LIMIT 1 ) )

有关详细信息,请参阅我的博客中的这篇文章:

<强>更新

如果您需要选择一个随机记录,请尝试以下方法:

ac_id

这假定您的{{1}}或多或少均匀分布。

答案 1 :(得分:13)

这取决于你需要的随机性。您链接的解决方案非常适合IMO。除非你在ID字段中有很大的空白,否则它仍然是随机的。

但是,你应该能够在一个查询中使用它(用于选择单个值):

SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1

其他解决方案:

  • 向表中添加一个名为random的永久浮点字段,并用随机数填充它。然后,您可以在PHP中生成一个随机数并执行"SELECT ... WHERE rnd > $random"
  • 抓取整个ID列表并将其缓存在文本文件中。阅读文件并从中挑选一个随机ID。
  • 将查询结果缓存为HTML并保留几个小时。

答案 2 :(得分:1)

我是这样做的:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != 'draft'
        AND c.acat_slug != 'vendeglatohely'
        AND a.ac_images != 'b:0;';

SET @sql := CONCAT('
  SELECT  a.ac_id,
        a.ac_status,
        a.ac_name,
        a.ac_status,
        a.ac_images
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != ''draft''
        AND c.acat_slug != ''vendeglatohely''
        AND a.ac_images != ''b:0;''
  LIMIT ', @r, ', 1');

PREPARE stmt1 FROM @sql;

EXECUTE stmt1;

答案 3 :(得分:0)

这将为您提供单个子查询,该查询将使用索引获取随机ID,然后另一个查询将触发获取您的连接表。

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
        SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)

答案 4 :(得分:0)

虚拟示例的解决方案是:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation,
        JOIN 
            accomodation_category 
            ON accomodation.ac_category = accomodation_category.acat_id
        JOIN 
            ( 
               SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
            ) AS Choices 
            USING (ac_id)
WHERE   accomodation.ac_id >= Choices.ac_id 
        AND accomodation.ac_status != 'draft'
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
LIMIT 1

要详细了解ORDER BY RAND()的替代方法,请阅读this article

答案 5 :(得分:0)

我正在优化项目中的许多现有查询。 Quassnoi的解决方案帮助我加快了查询速度!但是,我发现在所有查询中都很难将所述解决方案合并,特别是对于涉及多个大型表上的许多子查询的复杂查询。

所以我使用的是优化程度较低的解决方案。从根本上说,它的工作方式与Quassnoi的解决方案相同。

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
        AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size

$size * $factor / [accomodation_table_row_count] 计算出挑选随机行的概率。 rand()将生成一个随机数。如果rand()小于或等于概率,则将选择该行。这有效地执行随机选择以限制表格大小。由于它有可能返回小于定义的限制计数,我们需要增加概率以确保我们选择足够的行。因此,我们将$ size乘以$ factor(我通常设置$ factor = 2,在大多数情况下都适用)。最后,我们执行limit $size

现在的问题是 accomodation_table_row_count 。 如果我们知道表大小,我们可能会硬编码表大小。这将运行得最快,但显然这并不理想。如果您使用Myisam,获取表计数非常有效。由于我使用的是innodb,我只是做一个简单的计数+选择。在你的情况下,它看起来像这样:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
        AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size

棘手的部分是找出正确的概率。正如您所看到的,以下代码实际上只计算粗糙的临时表大小(实际上,太粗糙了!):(select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category))但您可以优化此逻辑以提供更接近的表大小近似值。 请注意,OVER-select最好不要选择不足的行。即如果概率设置得太低,则可能无法选择足够的行。

这个解决方案比Quassnoi的解决方案运行得慢,因为我们需要重新计算表大小。但是,我发现这种编码更易于管理。这是准确性+性能编码复杂性之间的权衡。话虽如此,在大型桌子上,这仍然远远快于Order by Rand()。

注意:如果查询逻辑允许,请在任何连接操作之前尽早执行随机选择。

答案 6 :(得分:0)

(是的,我会因为这里没有足够的肉而得到帮助,但是你有一天不能成为素食主义者吗?)

案例:连续的AUTO_INCREMENT没有间隙,1行返回
案例:连续AUTO_INCREMENT无间隙,10行
案例:AUTO_INCREMENT有间隙,1行返回
案例:用于随机化的额外FLOAT列 案例:UUID或MD5专栏

对于大型桌子,这5个案例可以非常有效。有关详细信息,请参阅my blog

答案 7 :(得分:-1)

function getRandomRow(){
    $id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
    $res = getRowById($id);
    if(!empty($res))
    return $res;
    return getRandomRow();
}

//rowid is a key on table
function getRowById($rowid=false){

   return db select from table where rowid = $rowid; 
}