MySQL - 如何在更复杂的查询中替换ORDER BY?

时间:2012-09-22 14:59:42

标签: php mysql sql random sql-order-by

为了从表中获取记录,我使用这个mysql查询:

SELECT 
    a.id as aid, a.data1 as adata1, a.data2 as adata2
    b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id ) 
WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100
ORDER BY RAND() 
LIMIT 1

此查询准确提取我需要的记录,但遗憾的是由于RAND()这个查询非常慢。

我找到了一些方法,如何避免使用RAND()功能,例如here。但我的问题是,我仍然找不到办法,如何在此查询中替换RAND()函数。 在一些简单的查询中,替换RAND()并不是问题,但我不知道如何在上面的示例中执行此操作...因为WHERE子句中有更多条件。

3 个答案:

答案 0 :(得分:1)

由于您使用的是MySQL,因此可以尝试使用以下SQL查询,这些查询首先从表中获取计数,然后根据该计数选择随机偏移量。然后它准备一个语句,以便可以使用计算的偏移量并执行语句。

SELECT @count := COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100;
SET @offset = CONVERT(FLOOR(RAND() * @count), SIGNED);
PREPARE mystatement FROM "SELECT 
                          a.id as aid, a.data1 as adata1, a.data2 as adata2
                          b.id as bid, b.data1 as bdata1, b.data2 as bdata2
                          FROM table AS a
                          JOIN table AS b ON ( a.id <> b.id ) 
                          WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT ?, 1";
EXECUTE mystatement USING @offset;
DEALLOCATE PREPARE mystatement;

在大型数据集上的执行速度应该比ORDER BY RAND()快,请尝试让我知道......; - )

修改

查询将无法在phpmyadmin上使用,因此使用MySQL控制台运行它们或编写一个php脚本,其中有两个选项,第一个是让mysql完成工作:

mysql_query('SELECT @count := COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100');
mysql_query('SET @offset = CONVERT(FLOOR(RAND() * @count), SIGNED)');
mysql_query('PREPARE mystatement FROM "SELECT 
                          a.id as aid, a.data1 as adata1, a.data2 as adata2
                          b.id as bid, b.data1 as bdata1, b.data2 as bdata2
                          FROM table AS a
                          JOIN table AS b ON ( a.id <> b.id ) 
                          WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT ?, 1"');
$res = mysql_query('EXECUTE mystatement USING @offset');
$row = mysql_fetch_assoc($res);
print_r($row);

可能更快的第二个选项包括使用MySQL完成部分工作,使用编程语言(在我们的例子中是PHP)中完成另一部分:

$res = mysql_query("SELECT COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100')");
$row = mysql_fetch_array($res);
$offset = rand(0, $row[0]-1);

$res = mysql_query("SELECT 
                              a.id as aid, a.data1 as adata1, a.data2 as adata2
                              b.id as bid, b.data1 as bdata1, b.data2 as bdata2
                              FROM table AS a
                              JOIN table AS b ON ( a.id <> b.id ) 
                              WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT $offset, 1");
$row = mysql_fetch_assoc($res);

另一种加速我发现的ORDER BY RAND()的替代方法包括如下查询:

SELECT 
    a.id as aid, a.data1 as adata1, a.data2 as adata2
    b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id ) 
WHERE (RAND() < (SELECT ((1/COUNT(*))*10) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) ) )
 AND (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100
ORDER BY RAND() 
LIMIT 1

别忘了告诉我你得到的结果;-)。

答案 1 :(得分:1)

你的问题不是很具体。 。 。桌子有多大?究竟什么是“非常慢”?您正在尝试查找表中的所有记录对,其中data1 = 1且等级的差异小于100.在以下版本中,我将所有条件移至“ON”子句,因此它们更清晰地在一起:

SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
       b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a join
     table AS b
     ON a.id <> b.id and
        a.data1 = b.data1 and
        a.data1 = 1 and b.data1 = 1 and
        ABS( a.rating - b.rating ) < 100
ORDER BY RAND() 
LIMIT 1

我还添加了附加条件a.data1 = b.data1,因为这有助于SQL引擎将其识别为equijoin,这应该有助于加入性能。

假设data1是选择性的(意味着相对较少的记录具有data1),那么您应该能够使用(data1,id)或(data1,rating)上的索引加快速度。

如果您知道每条记录至少有一个匹配(即每条记录都有另一条具有类似评级的记录),则以下变体应该更好:

SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
       b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM (select *
      from table AS a
      where a.data1 = 1
      order by rand()
      limit 1
     ) a join
     table AS b
     ON a.id <> b.id and
        a.data1 = b.data1 and
        a.data1 = 1 and b.data1 = 1 and
        ABS( a.rating - b.rating ) < 100
ORDER BY RAND() 
LIMIT 1

首先选择随机记录,然后进行自我加入。

这让我觉得你可以对这个问题采取不同的方法,如下所示。首先计算您正在查看的数据的评级。然后选择一个随机的评级对,其中差异小于100,然后找到与那些匹配的随机记录。有了data1和rating的索引,这种方法可能是最快的。

答案 2 :(得分:0)

如果你是O.K.在问题空间中分布较不均匀,您可以尝试:

SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
       b.id as bid, b.data1 as bdata1, b.data2 as bdata2
  FROM ( SELECT *
           FROM table
          WHERE data1 = 1
          ORDER
             BY RAND()
          LIMIT 1
       ) a
  JOIN table b
    ON b.data1 = 1
   AND b.rating BETWEEN a.rating - 100 AND a.rating + 100
 ORDER
    BY RAND()
 LIMIT 1
;

以上将随机选择一条记录为a,然后随机选择一条记录为b。因此,订购和加入的记录要少得多。这不太一致,因为这意味着a的所有选择的可能性将相等,而不是与b的可能相应选择的数量成比例,但也许它足够好目的