Question

我一直在尝试压缩字符串。给出这样的字符串：

AAABBCAADEEFF，我需要压缩它，如3A2B1C2A1D2E2F

我能够提出尾递归实现：

    @scala.annotation.tailrec
    def compress(str: List[Char], current: Seq[Char], acc: Map[Int, String]): String = str match {
      case Nil =>
        if (current.nonEmpty)
          s"${acc.values.mkString("")}${current.length}${current.head}"
        else
          s"${acc.values.mkString("")}"
      case List(x) if current.contains(x) =>
        val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length + 1}${current.head}")
        compress(List.empty[Char], Seq.empty[Char], newMap)
      case x :: xs if current.isEmpty =>
        compress(xs, Seq(x), acc)
      case x :: xs if !current.contains(x) =>
        if (acc.nonEmpty) {
          val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length}${current.head}")
          compress(xs, Seq(x), newMap)
        } else {
          compress(xs, Seq(x), acc ++ Map(1 -> s"${current.length}${current.head}"))
        }
      case x :: xs =>
        compress(xs, current :+ x, acc)
    }

// Produces 2F3A2B1C2A instead of 3A2B1C2A1D2E2F
    compress("AAABBCAADEEFF".toList, Seq.empty[Char], Map.empty[Int, String])

但是对于给定的情况，它失败了！不确定我错过的边缘情况！有什么帮助吗？

所以我实际上做的是，遍历字符序列，将相同的字符串收集到新的序列中，只要原始字符串输入中的新字符（压缩方法中的第一个参数）是在当前（压缩方法中的第二个参数）中找到，我一直在收集它。

如果不是这样，我会清空当前序列，计算并将收集的元素推入Map！对于我无法解决的一些边缘情况，它失败了！

Answer 1

我提出了这个解决方案：

def compress(word: List[Char]): List[(Char, Int)] =
  word.map((_, 1)).foldRight(Nil: List[(Char, Int)])((e, acc) =>
  acc match {
    case Nil => List(e)
    case ((c, i)::rest) => if (c == e._1) (c, i + 1)::rest else e::acc
  })

基本上，这是一张地图，然后是右侧折叠。

Answer 2

从@nicodp代码中获取灵感

def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

首先，我们的中间结果为List[(Char, Int)]。每个字符的字符元组列表将伴随其计数。

现在让我们使用Great开始一次查看列表中的一个char！ foldLeft

我们会在acc变量中累积结果，而e代表当前元素。

acc的类型为List[(Char, Int)]，e的类型为Char

现在，当我们开始时，我们首先是列表的char。现在acc是空列表。因此，我们将第一个元组附加到列表的前面acc 算上一个。

当acc为Nil时(e, 1) :: Nil或(e, 1) :: acc注意：acc为Nil

现在列表前面是我们感兴趣的节点。

让我们转到第二个元素。现在acc有一个元素，它是第一个计数为1的元素。

现在，我们将当前元素与列表的前面元素进行比较如果匹配，则递增计数并将（element，incrementedCount）放在列表的前面代替旧元组。

如果当前元素与最后一个元素不匹配，那意味着我们拥有新元素。因此，我们将带有计数1的新元素附加到列表的前面，依此类推。

然后将List[(Char, Int)]转换为必需的字符串表示。

注意：我们使用列表的前面元素，可以在O（1）中访问（常数时间复杂度）有缓冲区，并在发现相同元素的情况下增加计数。

Scala REPL

scala> :paste
// Entering paste mode (ctrl-D to finish)

def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

// Exiting paste mode, now interpreting.

encode: (word: String)String

scala> encode("AAABBCAADEEFF")
res0: String = 3A2B1C2A1D2E2F

在模式匹配

中使用反向标记e而不是保护更简洁

   def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((`e`, lastCharCount) :: xs) => (e, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

Answer 3

这是另一种更简化的方法based upon this answer：

class StringCompressinator {
  def compress(raw: String): String = {
    val split: Array[String] = raw.split("(?<=(.))(?!\\1)", 0) // creates array of the repeated chars as strings

    val converted = split.map(group => {
      val char = group.charAt(0) // take first char of group string
      s"${group.length}${char}"  // use the length as counter and prefix the return string "AAA" becomes "3A"
    })
    converted.mkString("") // converted is again array, join turn it into a string
  }
}

import org.scalatest.FunSuite

class StringCompressinatorTest extends FunSuite {

  test("testCompress") {
    val compress = (new StringCompressinator).compress(_)
    val input = "AAABBCAADEEFF"
    assert(compress(input) == "3A2B1C2A1D2E2F")
  }
}

Answer 4

类似的想法略有不同：

用于匹配头部的模式的案例类，因此我们不需要使用if，它也有助于通过覆盖toString打印最终结果

在模式匹配时使用大写字母作为变量名称（无论是那个还是后退，我不知道哪个更喜欢：P）

   case class Count(c : Char, cnt : Int){
      override def toString = s"$cnt$c"
   }

   def compressor( counts : List[Count], C : Char ) = counts match {
        case Count(C, cnt) :: tail => Count(C, cnt + 1) :: tail
        case _ => Count(C, 1) :: counts
   }

   "AAABBCAADEEFF".foldLeft(List[Count]())(compressor).reverse.mkString
   //"3A2B1C2A1D2E2F"

压缩Scala中的给定文本字符串

4 个答案:

Scala REPL