Question

我需要半随机提取数据，即随机项，但是在数据的某个子集中。并且需要多次这样做。

我的第一种方法是使用Postgres ORDER BY random()并使用WHERE语句进行过滤，但效果不佳。

你有什么建议吗？

Answer 1

如果您使用以下内容，则可以避免order by random()

select * from table where [your conditions] and random()>.9

这将选择与所有其他条件匹配的大约90％的行。但是，我不确定这是否会提高性能。

另一个策略：

在数据中添加1到1000之间随机数的列（例如，使用名称randc）
在此列上创建索引
使用select * from table where [your conditions] and randc > 900

因为数字是随机的，所以很有可能选择符合条件的行约90％。

Answer 2

我最终通过Tire（Ruby gem）使用Elasticsearch。通过正确的索引，性能使页面加载时间从30+秒变为<1s（并且与DB大小无关）。

示例：

Recipe.search do |search|
    search.sort do |sort|
      sort.by({
        _script: { 
          script: "Math.random()",
          type: "number",
          params: {},
          order: "asc"
        }
      })
    end

    search.size 1
end

生成：

{
"sort": [{
    "_script": {
        "script": "Math.random()",
        "type": "number",
        "params": {},
        "order": "asc"
    }
}],
"size": 1
}

对于数据约束的随机查询，您会使用什么技术？

2 个答案: