Question

我想从数组中选择 n 个唯一元素，其中我的数组大小通常为 1000 ，而 n 的值为 3 。我要在迭代算法中实现该算法，迭代次数约为 3000000 ，并且每次迭代中我都必须获得n个唯一元素。这是我喜欢的一些可用解决方案，但由于它们的缺点，我无法使用它们，如下所述。

import scala.util.Random
val l = Seq("a", "b", "c", "d", "e")
val ran = l.map(x => (Random.nextFloat(), x)).sortBy(_._1).map(_._2).take(3)

此方法比较慢，因为必须创建三个数组并对数组进行排序。

 val list = List(1,2,3,4,5,1,2,3,4,5)
 val uniq = list.distinct
 val shuffled = scala.util.Random.shuffle(uniq)
 val sampled = shuffled.take(n)

会生成两个数组，将大型数组改组会比较慢。

 val arr = Array.fill(1000)(math.random )
 for (i <- 1 to n; r = (Math.random * xs.size).toInt) yield arr(r)

这是一种更快的技术，但有时返回同一元素的次数不止一次。这是输出。

val xs = List(60, 95, 24, 85, 50, 62, 41, 68, 34, 57)
for (i <- 1 to n; r = (Math.random * xs.size).toInt) yield xs(r)

res: scala.collection.immutable.IndexedSeq[Int] = Vector( 24 , 24 , 41)

可以看到， 24 返回了 2次。

如何更改最后一种方法以获得唯一元素？还有其他更优化的方式来执行相同任务吗？

Answer 1

您的示例似乎并不相关（字符串？整数？），但是也许类似的事情会起作用。

import scala.util.Random.nextInt

val limit = 1000  // 0 to 999 inclusive
val n = 3
Iterator.iterate(Set.fill(n)(nextInt(limit)))(_ + nextInt(limit))
        .dropWhile(_.size < n)
        .next()
//res0: Set[Int] = Set(255, 426, 965)

Answer 2

这是一个递归例程，比其他答案更有效地完成工作。

它建立索引列表，然后检查值是否不同。在极少数情况下有重复项的情况下，会删除这些重复项并添加新值，直到有一组独特的索引为止。

其他答案会在每次添加元素时检查列表是否不同。

def randomIndices(arraySize: Int, nIndices: Int): List[Int] = {
  def loop(done: Int, res: List[Int]): List[Int] =
    if (done < nIndices) {
      loop(done + 1, (Math.random * arraySize).toInt +: res)
    } else {
      val d = res.distinct
      val dSize = d.size

      if (dSize < nIndices) {
        loop(dSize, d)
      } else {
        res
      }
    }

  if (nIndices > arraySize) {
    randomIndices(arraySize, arraySize)
  } else {
    loop(0, Nil)
  }
}

randomIndices(xs.size, 3).map(xs)

当元素的数量比数组的大小少时，这应该很有效。

Answer 3

这是Fisher-Yates随机播放的变体。它不会修改原始数组，而是将索引数组改组为原始数组。通过添加间接层可以解决很多问题。它不会随机播放整个数组，而只是随机播放数组中的num个项目。

我不知道scala，所以这是伪代码：

partShuffle(num, originalArray)

  // 1. Make a copy of the original array's indices.
  indexAry <- new Array(originalArray.length)
  for (i <- 0; i < originalArray.length; i++)
    indexAry[i] <- i
  end for

  // 2. Shuffle num random unique indices to the front.
  for (i <- 0; i < num; i++)
    // 2.1 New pick from the unpicked part of the array.
    currentPick <- i + random(originalArray.length - i)

    // 2.2 Swap the current pick to the front of the array.
    temp <- indexAry[i]
    indexAry[i] <- indexAry[currentPick]
    indexAry[currentPick] <- temp
  end for

  // 3. Build the return array.
  returnAry <- new Array(num)
  for (i <- 0; i < num; i++)
    returnAry[i] <- originalAry[indexAry[i]]
  end for

  return returnAry

end partShuffle()

当您选择一个索引时，它会交换到索引数组的前面，并从进一步的选择中排除。首选来自[0..size-1]；第二个选择来自[1..size-1]； [2..size-1]中的第三个选择，依此类推。从原始数组中选择的索引将以较短的数组（长度为num）返回。

假设：

数组是从零开始的
indexAry是一个整数数组
returnAry与originalAry的类型相同
random()植入了代码中的其他地方
random(x)返回一个从0（含）到x（不含）的整数

Answer 4

您可以在范围之间生成随机索引。通过逻辑将List分成n个部分，生成边界值。

示例：给定一个包含100个元素的列表，并且n为3，则边界为（0,32）（33,65）（66,99）

  // Gets random int between "start" and "end"
  def randomInt(start: Int, end: Int): Int =
    start + scala.util.Random.nextInt((end - start) + 1)

  // Generates random int; given limit, total number of chunks and nth chunk
  def randomIntN(limit: Int, n: Int, nth: Int): Int =
    randomInt(
      (((limit / n) * nth) - (limit / n)),
      (((limit / n) * nth) - (if (n == nth) 0 else 1))
    )

  // Generate sequence
  for (
    i <- 1 to n;
    r = randomIntN(xs.size, n, i)
  ) yield xs(r)

以下是一些输出

res4: IndexedSeq[Int] = Vector(60, 50, 57)
res5: IndexedSeq[Int] = Vector(60, 85, 34)
res6: IndexedSeq[Int] = Vector(24, 50, 41)

从数组中随机选择n个元素

4 个答案: