Question

我在StackOverflow上遇到过很多同样的问题。没有提供有效的可靠解决方案，所以我们在这里：

我需要以有效的方式从我的MongoDB数据库中的大约500万个文档中选择一个随机文档。

我已尝试获取.count并使用.skip获取随机文档，但这需要将近三秒钟且非常非常低效。

我无法对每个文档进行更改（例如添加＆＃34;随机＆＃34;）条目或更改其_id

。

我已尝试使用增量_id添加文档的解决方案（选择随机_id以绕过使用.skip），但这比我尝试时更加头疼在很短的时间内添加许多文档。

以增量方式添加数据或选择随机文档不应该这么难。我要么缺少一些常识，要么做错了，或者这就是它真正的......

想要提出这个主题并得到你的答复。

Answer 1

以下是使用_id的默认ObjectId值以及一些数学和逻辑的方法。

// Get the "min" and "max" timestamp values from the _id in the collection and the 
// diff between.
// 4-bytes from a hex string is 8 characters

var min = parseInt(db.collection.find()
        .sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    max = parseInt(db.collection.find()
        .sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    diff = max - min;

// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;

// work out a "random" _id value in the range:
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")

// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
   .sort({ "_id": 1 }).limit(1).toArray()[0];

这是shell表示的一般逻辑，易于适应。

所以要点：

查找集合中的最小和最大主键值
生成一个介于这些文档的时间戳之间的随机数。
将随机数添加到最小值，并找到大于或等于该值的第一个文档。

这使用＆＃34;填充＆＃34;来自＆＃34; hex＆＃34;中的时间戳值形成有效的ObjectId值，因为这是我们正在寻找的。使用整数作为_id值本质上更简单，但在各点中基本相同。

mongodb：另一个＆＃34;如何添加随机记录＆＃34;线

1 个答案: