Question

让我们假设我们有一个Scala列表：

val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)

我们可以使用以下代码轻松删除重复项：

l1.distinct

或

l1.toSet.toList

但是如果我们想要删除重复项，只有当它们超过2个时呢？因此，如果有两个以上具有相同值的元素，我们只保留两个，并删除其余元素。

我可以使用以下代码实现它：

l1.groupBy(identity).mapValues(_.take(2)).values.toList.flatten

给了我结果：

List(2, 2, 5, 1, 1, 3, 3)

删除了元素，但剩余元素的顺序与这些元素在初始列表中的显示方式不同。如何进行此操作并保留原始列表中的顺序？

所以l1的结果应该是：

List(1, 2, 3, 1, 3, 2, 5)

Answer 1

效率不高。

scala> val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)
l1: List[Int] = List(1, 2, 3, 1, 1, 3, 2, 5, 1)

scala> l1.zipWithIndex.groupBy( _._1 ).map(_._2.take(2)).flatten.toList.sortBy(_._2).unzip._1
res10: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

Answer 2

我的谦虚回答：

def distinctOrder[A](x:List[A]):List[A] = {
    @scala.annotation.tailrec
    def distinctOrderRec(list: List[A], covered: List[A]): List[A] = {
       (list, covered) match {
         case (Nil, _) => covered.reverse
         case (lst, c) if c.count(_ == lst.head) >= 2 => distinctOrderRec(list.tail, covered)
         case _ =>  distinctOrderRec(list.tail, list.head :: covered)
       }
    }
    distinctOrderRec(x, Nil)
}

结果：

scala> val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)
l1: List[Int] = List(1, 2, 3, 1, 1, 3, 2, 5, 1)

scala> distinctOrder(l1)
res1: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

On Edit：就在我上床睡觉之前，我想出了这个！

l1.foldLeft(List[Int]())((total, next) => if (total.count(_ == next) >= 2) total else total :+ next)

回答：

res9: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

Answer 3

不是最漂亮的。我期待看到其他解决方案。

def noMoreThan(xs: List[Int], max: Int) =
{
  def op(m: Map[Int, Int], a: Int) = {
    m updated (a, m(a) + 1)
  }
  xs.scanLeft( Map[Int,Int]().withDefaultValue(0) ) (op).tail
    .zip(xs)
    .filter{ case (m, a) => m(a) <= max }
    .map(_._2)
}

scala> noMoreThan(l1, 2)
res0: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

Answer 4

使用foldLeft更简单的版本：

l1.foldLeft(List[Int]()){(acc, el) => 
     if (acc.count(_ == el) >= 2) acc else el::acc}.reverse

Answer 5

类似于实现distinct的方式，使用multiset而不是set：

def noMoreThan[T](list : List[T], max : Int) = {
    val b = List.newBuilder[T]
    val seen = collection.mutable.Map[T,Int]().withDefaultValue(0)
    for (x <- list) {
      if (seen(x) < max) {
        b += x
        seen(x) += 1
      }
    }
    b.result()
  }

Answer 6

基于经验的答案，但使用foldLeft：

def noMoreThanBis(xs: List[Int], max: Int) = {
  val initialState: (Map[Int, Int], List[Int]) = (Map().withDefaultValue(0), Nil)
  val (_, result) = xs.foldLeft(initialState) { case ((count, res), x) =>
    if (count(x) >= max)
      (count, res)
    else
      (count.updated(x, count(x) + 1), x :: res)
  }
  result.reverse
}

Answer 7

distinct已定义为SeqLike

/** Builds a new $coll from this $coll without any duplicate elements.
 *  $willNotTerminateInf
 *
 *  @return  A new $coll which contains the first occurrence of every element of this $coll.
 */
def distinct: Repr = {
  val b = newBuilder
  val seen = mutable.HashSet[A]()
  for (x <- this) {
    if (!seen(x)) {
      b += x
      seen += x
    }
  }
  b.result()
}

我们可以以非常类似的方式定义我们的功能：

def distinct2[A](ls: List[A]): List[A] = {
  val b = List.newBuilder[A]
  val seen1 = mutable.HashSet[A]()
  val seen2 = mutable.HashSet[A]()
  for (x <- ls) {
    if (!seen2(x)) {
      b += x
      if (!seen1(x)) {
        seen1 += x
      } else {
        seen2 += x
      }
    }
  }
  b.result()
}

scala> distinct2(l1)
res4: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

此版本使用内部状态，但仍然是纯粹的。任意 n （目前为2）也很容易推广，但特定版本的性能更高。

您可以使用带有＆＃34的折叠来实现相同的功能;可以看到一次和两次＆＃34;跟你说。然而for循环和可变状态做同样的工作。

Answer 8

这个怎么样：

list
 .zipWithIndex
 .groupBy(_._1)
 .toSeq
 .flatMap { _._2.take(2) }       
 .sortBy(_._2)
 .map(_._1)

Answer 9

它有点难看，但它相对较快

val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)
l1.foldLeft((Map[Int, Int](), List[Int]())) { case ((m, ls), x) => {
  val z = m + ((x, m.getOrElse(x, 0) + 1))
  (z, if (z(x) <= 2) x :: ls else ls)
}}._2.reverse

给予：List(1, 2, 3, 1, 3, 2, 5)

Answer 10

这是一个递归解决方案（它将为大型列表堆栈溢出）：

  def filterAfter[T](l: List[T], max: Int): List[T] = {
    require(max > 1)
    //keep the state of seen values
    val seen = Map[T, Int]().withDefaultValue(0)//init to 0
    def filterAfter(l: List[T], seen: Map[T, Int]): (List[T], Map[T, Int]) = {
      l match {
        case x :: xs =>
          if (seen(x) < max) {
            //Update the state and pass to next
            val pair = filterAfter(xs, seen updated (x, seen(x) + 1))
            (x::pair._1, pair._2)
          } else {
            //already seen more than max
            filterAfter(xs, seen)
          }
        case _ => (l, seen)//empty, terminate recursion
      }
    }
    //call inner recursive function
    filterAfter(l, seen, 2)._1
  }

Answer 11

这是规范的Scala代码，将连续三行或更多行连续减少为两行：

  def checkForTwo(candidate: List[Int]): List[Int] = {
    candidate match {
      case x :: y :: z :: tail if x == y && y == z =>
        checkForTwo(y :: z :: tail)
      case x :: tail => 
        x :: checkForTwo(tail)
      case Nil =>
        Nil
    }
  }

它查看列表的前三个元素，如果它们相同，则删除第一个元素并重复该过程。否则，它会传递物品。

Answer 12

使用groupBy和过滤器的解决方案，没有任何排序（因此它是O（N），排序将在典型情况下为您提供额外的O（Nlog（N））：

val li = l1.zipWithIndex
val pred = li.groupBy(_._1).flatMap(_._2.lift(1)) //1 is your "2", but - 1
for ((x, i) <- li if !pred.get(x).exists(_ < i)) yield x

Answer 13

我更喜欢使用不可变Map：

  def noMoreThan[T](list: List[T], max: Int): List[T] = {
    def go(tail: List[T], freq: Map[T, Int]): List[T] = {
      tail match {
        case h :: t =>
          if (freq(h) < max)
            h :: go(t, freq + (h -> (freq(h) + 1)))
          else go(t, freq)
        case _ => Nil
      }
    }
    go(list, Map[T, Int]().withDefaultValue(0))
  }

如何从列表中删除2个或更多重复项并保持其初始订单？

13 个答案: