为了从表中获取记录,我使用这个mysql查询:
SELECT
a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id )
WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100
ORDER BY RAND()
LIMIT 1
此查询准确提取我需要的记录,但遗憾的是由于RAND()
这个查询非常慢。
我找到了一些方法,如何避免使用RAND()
功能,例如here。但我的问题是,我仍然找不到办法,如何在此查询中替换RAND()
函数。
在一些简单的查询中,替换RAND()
并不是问题,但我不知道如何在上面的示例中执行此操作...因为WHERE
子句中有更多条件。
答案 0 :(得分:1)
由于您使用的是MySQL,因此可以尝试使用以下SQL查询,这些查询首先从表中获取计数,然后根据该计数选择随机偏移量。然后它准备一个语句,以便可以使用计算的偏移量并执行语句。
SELECT @count := COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100;
SET @offset = CONVERT(FLOOR(RAND() * @count), SIGNED);
PREPARE mystatement FROM "SELECT
a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id )
WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT ?, 1";
EXECUTE mystatement USING @offset;
DEALLOCATE PREPARE mystatement;
在大型数据集上的执行速度应该比ORDER BY RAND()
快,请尝试让我知道......; - )
修改强>
查询将无法在phpmyadmin上使用,因此使用MySQL控制台运行它们或编写一个php脚本,其中有两个选项,第一个是让mysql完成工作:
mysql_query('SELECT @count := COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100');
mysql_query('SET @offset = CONVERT(FLOOR(RAND() * @count), SIGNED)');
mysql_query('PREPARE mystatement FROM "SELECT
a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id )
WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT ?, 1"');
$res = mysql_query('EXECUTE mystatement USING @offset');
$row = mysql_fetch_assoc($res);
print_r($row);
可能更快的第二个选项包括使用MySQL完成部分工作,使用编程语言(在我们的例子中是PHP)中完成另一部分:
$res = mysql_query("SELECT COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100')");
$row = mysql_fetch_array($res);
$offset = rand(0, $row[0]-1);
$res = mysql_query("SELECT
a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id )
WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT $offset, 1");
$row = mysql_fetch_assoc($res);
另一种加速我发现的ORDER BY RAND()的替代方法包括如下查询:
SELECT
a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id )
WHERE (RAND() < (SELECT ((1/COUNT(*))*10) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) ) )
AND (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100
ORDER BY RAND()
LIMIT 1
别忘了告诉我你得到的结果;-)。
答案 1 :(得分:1)
你的问题不是很具体。 。 。桌子有多大?究竟什么是“非常慢”?您正在尝试查找表中的所有记录对,其中data1 = 1且等级的差异小于100.在以下版本中,我将所有条件移至“ON”子句,因此它们更清晰地在一起:
SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a join
table AS b
ON a.id <> b.id and
a.data1 = b.data1 and
a.data1 = 1 and b.data1 = 1 and
ABS( a.rating - b.rating ) < 100
ORDER BY RAND()
LIMIT 1
我还添加了附加条件a.data1 = b.data1
,因为这有助于SQL引擎将其识别为equijoin,这应该有助于加入性能。
假设data1是选择性的(意味着相对较少的记录具有data1),那么您应该能够使用(data1,id)或(data1,rating)上的索引加快速度。
如果您知道每条记录至少有一个匹配(即每条记录都有另一条具有类似评级的记录),则以下变体应该更好:
SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM (select *
from table AS a
where a.data1 = 1
order by rand()
limit 1
) a join
table AS b
ON a.id <> b.id and
a.data1 = b.data1 and
a.data1 = 1 and b.data1 = 1 and
ABS( a.rating - b.rating ) < 100
ORDER BY RAND()
LIMIT 1
首先选择随机记录,然后进行自我加入。
这让我觉得你可以对这个问题采取不同的方法,如下所示。首先计算您正在查看的数据的评级。然后选择一个随机的评级对,其中差异小于100,然后找到与那些匹配的随机记录。有了data1和rating的索引,这种方法可能是最快的。
答案 2 :(得分:0)
如果你是O.K.在问题空间中分布较不均匀,您可以尝试:
SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM ( SELECT *
FROM table
WHERE data1 = 1
ORDER
BY RAND()
LIMIT 1
) a
JOIN table b
ON b.data1 = 1
AND b.rating BETWEEN a.rating - 100 AND a.rating + 100
ORDER
BY RAND()
LIMIT 1
;
以上将随机选择一条记录为a
,然后随机选择一条记录为b
。因此,订购和加入的记录要少得多。这不太一致,因为这意味着a
的所有选择的可能性将相等,而不是与b
的可能相应选择的数量成比例,但也许它足够好目的