Question

以下是imperative解决方案：

def longestCommonSubstring(a: String, b: String) : String = {
    def loop(m: Map[(Int, Int), Int], bestIndices: List[Int], i: Int, j: Int) : String = {
      if (i > a.length) {
        b.substring(bestIndices(1) - m((bestIndices(0),bestIndices(1))), bestIndices(1))
      } else if (i == 0 || j == 0) {
        loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
      } else if (a(i-1) == b(j-1) && math.max(m((bestIndices(0),bestIndices(1))), m((i-1,j-1)) + 1) == (m((i-1,j-1)) + 1)) {
        loop(
          m + ((i,j) -> (m((i-1,j-1)) + 1)),
          List(i, j),
          if(j == b.length) i + 1 else i,
          if(j == b.length) 0 else j + 1
        )
      } else {
        loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
      }
    }
    loop(Map[(Int, Int), Int](), List(0, 0), 0, 0)
  }

我正在寻找更紧凑的functional way来找到最长公共子串。

Answer 1

def getAllSubstrings(str: String): Set[String] = {
  str.inits.flatMap(_.tails).toSet
}
def longestCommonSubstring(str1: String, str2: String): String = {
  val str1Substrings = getAllSubstrings(str1)
  val str2Substrings = getAllSubstrings(str2)

  str1Substrings.intersect(str2Substrings).maxBy(_.length)
}

首先获取两个字符串集合中的所有可能的子字符串（取自here）（以删除重复项），然后将这些字符串相交并找到最长的公共子字符串。

Answer 2

您拥有的代码已经具备功能，而不是那么复杂。与目前发布的其他解决方案相比，它的时间效率也略有提升。

我只是简化它，清理一下并修复错误：

def longestCommonSubstring(a: String, b: String) = {  
  def loop(bestLengths: Map[(Int, Int), Int], bestIndices: (Int, Int), i: Int, j: Int): String = {
    if (i > a.length) {
      val bestJ = bestIndices._2
      b.substring(bestJ - bestLengths(bestIndices), bestJ)
    } else {
      val currentLength = if (a(i-1) == b(j-1)) bestLengths(i-1, j-1) + 1 else 0
      loop(
        bestLengths + ((i, j) -> currentLength), 
        if (currentLength > bestLengths(bestIndices)) (i, j) else bestIndices, 
        if (j == b.length) i + 1 else i,
        if (j == b.length) 1 else j + 1)
    }
  }

  loop(Map.empty[(Int, Int), Int].withDefaultValue(0), (0, 0), 1, 1)
}

Answer 3

解决方案可能如下：

def substrings(a:String, len:Int): Stream[String] =
  if(len==0) 
    Stream.empty
  else 
    a.tails.toStream.takeWhile(_.size>=len).map(_.take(len)) #:: substrings(a, len-1)

def longestCommonSubstring(a:String, b:String) = 
  substrings(a, a.length).dropWhile(sub => !b.contanis(sub)).headOption

这里子串方法返回Stream，产生原始字符串的递减长度子串，例如“test”产生“test”，“tes”，“est”，“te”，“es”， ...

方法 longestCommonSubstring 获取从 a 生成的第一个子字符串，该字符串包含在字符串 b

中

Answer 4

更新：发布此答案后，并感谢@Kolmar的反馈，我发现Char索引策略的速度明显更快（至少是一个数量级）。现在，我添加了additional answer（根据StackOverflow的政策），该方法涵盖了实质上更快的Scala功能样式解决方案。

我应该更加关注OP专门提供的实现。不幸的是，我使用面向效率低的String进行比较而对所有其他答案分散了注意力，并为提供能够使用Stream和LazyList的那些高兴的人提供了自己的优化版本

除了OP的请求之外，我还具有一些其他要求，以组成一个解决方案，以找到两个String实例之间最长的公共子字符串（LCS）。

解决方案要求：

迅速找到两个String实例之间的 first LCS
通过比较更少的String实例来最大程度地减少CPU工作量
通过产生更少的String实例来最大程度地减少GC工作
最大化Scala习惯用法，包括使用Scala Collections API

第一个目标是捕获常规搜索策略。该过程从left String实例开始，生成从最长的字符串（原始String实例本身）到最短的（单个字符）有序的子字符串列表。例如，如果left String实例包含“ ABCDEF”，则应按以下顺序生成String实例的结果列表：

[
  ABCDEF,
  ABCDE, BCDEF,
  ABCD, BCDE, CDEF,
  ABC, BCD, CDE, DEF,
  AB,BC,CD,DE,EF,
  A,B,C,D,E,F
]

接下来，将通过此left子字符串实例列表开始迭代，一旦在left {{1}内的任何索引处找到了特定的right子字符串实例，就立即停止。 }实例。找到String子字符串实例后，将返回它。否则，将返回没有找到匹配项的指示。

关于满足解决方案要求＃1的“渴望”方法，有两点需要注意：

找到的left子字符串实例可以出现在left right实例中的多个索引处。这意味着使用String从left right实例的开头搜索String子字符串实例可能会导致与从{{ 1}} indexOf实例使用right。
可能有另一个（不同）长度相同的String子字符串实例，该实例也出现在lastIndexOf left实例中。此实现忽略了这种可能性。

Scala 2.13 / Dotty（a.k.a 3.0）的解决方案-使用LazyList：

right从2.13版开始被弃用。

String

Solution Scala 2.12和更低版本-使用Stream：

Stream

注释：

提供visual diff between the two versions以快速查看变化量
要覆盖各种边缘情况（例如：提供一个空的def longestCommonSubstring(left: String, right: String): Option[String] = if (left.nonEmpty && right.nonEmpty) { def substrings(string: String): LazyList[String] = { def recursive(size: Int = string.length): LazyList[String] = { if (size > 0) { def ofSameLength: LazyList[String] = (0 to (string.length - size)) .iterator.to(LazyList) .map(offset => string.substring(offset, offset + size)) ofSameLength #::: recursive(size - 1) } else LazyList.empty } recursive() } val (shorter, longer) = if (left.length <= right.length) (left, right) else (right, left) substrings(shorter).find(longer.contains) } else None作为输入），请使用def longestCommonSubstring(left: String, right: String): Option[String] = if (left.nonEmpty && right.nonEmpty) { def substrings(string: String): Stream[String] = { def recursive(size: Int = string.length): Stream[String] = { if (size > 0) { def ofSameLength: Stream[String] = (0 to (string.length - size)) .toStream .map(offset => string.substring(offset, offset + size)) ofSameLength #::: recursive(size - 1) } else Stream.empty } recursive() } val (shorter, longer) = if (left.length <= right.length) (left, right) else (right, left) substrings(shorter).find(longer.contains) } else None作为函数的返回类型。
在满足解决方案要求＃2和＃3的情况下，将两个String实例中的较短者设置为Option[String]，以减少实例化和与String长于{{1 }} shorter实例。
在满足解决方案要求＃2和＃3的情况下，String子字符串实例的生成以相同的大小进行子批处理，并删除了重复项（通过longer）。然后将每个子批处理添加到String（或shorter）中。这样可以确保仅将实际需要的distinct子字符串实例的实例化提供给LazyList函数。

Answer 5

我认为 for 理解的代码看起来非常清晰和实用。

def getSubstrings(s:String) =
  for {
    start <- 0 until s.size
    end <- start to s.size

  } yield s.substring(start, end)

def getLongest(one: String, two: String): Seq[String] =

  getSubstrings(one).intersect(getSubstrings(two))
 .groupBy(_.length).maxBy(_._1)._2

最终函数返回一个Seq [String]，只要结果可能包含几个具有相同最大长度的子字符串

Answer 6

旁注：：这是我对这个问题的第二个答案，因为StackOverflow策略不允许从根本上替换先前答案的内容。还要感谢@Kolmar的反馈，这个新答案比my prior answer的性能要好得多。

LCS（最长公共子字符串）问题空间已投入大量时间来寻找最佳解决方案策略。要观察更一般的计算机科学问题和最佳策略，请查看this Wikipedia article。这篇Wikipedia文章的更下方是一些pseudocode describing an implementation strategy。

基于Wikipedia文章的伪代码，我将介绍几种不同的解决方案。目的是允许人们无需进行大量重构即可复制/粘贴所需的特定解决方案：

LCSubstr：翻译成the Wikipedia article pseudocode的Scala，它使用命令式可变样式
LCSubstrFp：将LCSubstr重构为惯用的 Scala功能不变样式
longestCommonSubstrings：重构LCSubstrFp以使用描述性名称（例如left和right而不是s和t），并使用跳过在Map
longestCommonSubstringsFast：重构longestCommonSubstrings以便针对CPU和内存进行深度优化
longestCommonSubstringsWithIndexes：重构longestCommonSubstringsFast，以通过将每个条目扩展为(String, (Int, Int))的元组来增强返回值，该元组既包含找到的子字符串，又包含每个输入{{1}中的索引}，在其中找到子字符串（注意：如果同一对String出现多次，则将创建索引对的组合扩展）
String：firstLongestCommonSubstring的以效率为中心的版本，当仅关心第一个LCS并希望忽略其他相同大小的LCS时，它提供了提前终止的机会。
奖金： ：longestCommonSubstringsFast：重构longestCommonSubstringsUltimate以增加内部实现的可变性，同时在外部保留函数的引用透明性。

对OP的请求的更直接答案将落在longestCommonSubstringsFast和LCSubstrFp之间。 longestCommonSubstringsFast是最直接的方法，但是效率很低。使用LCSubstrFp的效率大大提高，因为最终使用的CPU和GC少得多。而且，如果在函数实现中包含和约束的内部可变性是可以接受的，那么longestCommonSubstringsFast是迄今为止CPU负担和内存占用最小的版本。

LCSubstr

翻译成the Wikipedia article pseudocode的Scala，它使用了命令式可变样式。

目的是尽可能地与Scala尽可能地重现一对一的实现。例如，Scala假定String的基于零的索引，而伪代码显然使用基于一的索引，这需要进行一些调整。

longestCommonSubstringsUltimate

LCSubstrFp

将def LCSubstr(s: String, t: String): scala.collection.mutable.Set[String] = if (s.nonEmpty && t.nonEmpty) { val l: scala.collection.mutable.Map[(Int, Int), Int] = scala.collection.mutable.Map.empty var z: Int = 0 var ret: scala.collection.mutable.Set[String] = scala.collection.mutable.Set.empty (0 until s.length).foreach { i => (0 until t.length).foreach { j => if (s(i) == t(j)) { if ((i == 0) || (j == 0)) l += ((i, j) -> 1) else l += ((i, j) -> (l((i - 1, j - 1)) + 1)) if (l((i, j)) > z) { z = l((i, j)) ret = scala.collection.mutable.Set(s.substring(i - z + 1, i + 1)) } else if (l((i, j)) == z) ret += s.substring(i - z + 1, i + 1) } else l += ((i, j) -> 0) } } ret } else scala.collection.mutable.Set.empty重构为惯用的 Scala函数不变样式。

所有命令式和突变式代码均已替换为功能和不可变的对应物。两个LCSubstr循环已被递归替换。

for

longestCommonSubstrings

重构def LCSubstrFp(s: String, t: String): Set[String] = if (s.nonEmpty && t.nonEmpty) { @scala.annotation.tailrec def recursive( i: Int = 0, j: Int = 0, z: Int = 0, l: Map[(Int, Int), Int] = Map.empty, ret: Set[String] = Set.empty ): Set[String] = if (i < s.length) { val (newI, newJ) = if (j < t.length - 1) (i, j + 1) else (i + 1, 0) val lij = if (s(i) != t(j)) 0 else if ((i == 0) || (j == 0)) 1 else l((i - 1, j - 1)) + 1 recursive( newI, newJ, if (lij > z) lij else z, l + ((i, j) -> lij), if (lij > z) Set(s.substring(i - lij + 1, i + 1)) else if ((lij == z) && (z > 0)) ret + s.substring(i - lij + 1, i + 1) else ret ) } else ret recursive() } else Set.empty以使用描述性名称（例如LCSubstrFp和left而不是right和s），并跳过将零长度存储在t。

除了增强可读性，这种重构省去了在Map中存储零长度值，从而大大减少了“内存搅动”的数量。再次对功能样式进行了调整，通过将lengthByIndexLongerAndIndexShorter包装在Set中，返回值也得到了增强，从不返回空的Set。如果返回的值为Option，则包含的Some将始终包含至少一项。

Set

longestCommonSubstringsFast

重构def longestCommonSubstrings(left: String, right: String): Option[Set[String]] = if (left.nonEmpty && right.nonEmpty) { val (shorter, longer) = if (left.length < right.length) (left, right) else (right, left) @scala.annotation.tailrec def recursive( indexLonger: Int = 0, indexShorter: Int = 0, currentLongestLength: Int = 0, lengthByIndexLongerAndIndexShorter: Map[(Int, Int), Int] = Map.empty, accumulator: List[Int] = Nil ): (Int, List[Int]) = if (indexLonger < longer.length) { val length = if (longer(indexLonger) != shorter(indexShorter)) 0 else if ((indexShorter == 0) || (indexLonger == 0)) 1 else lengthByIndexLongerAndIndexShorter.getOrElse((indexLonger - 1, indexShorter - 1), 0) + 1 val newCurrentLongestLength = if (length > currentLongestLength) length else currentLongestLength val newLengthByIndexLongerAndIndexShorter = if (length > 0) lengthByIndexLongerAndIndexShorter + ((indexLonger, indexShorter) -> length) else lengthByIndexLongerAndIndexShorter val newAccumulator = if ((length < currentLongestLength) || (length == 0)) accumulator else { val entry = indexShorter - length + 1 if (length > currentLongestLength) List(entry) else entry :: accumulator } if (indexShorter < shorter.length - 1) recursive( indexLonger, indexShorter + 1, newCurrentLongestLength, newLengthByIndexLongerAndIndexShorter, newAccumulator ) else recursive( indexLonger + 1, 0, newCurrentLongestLength, newLengthByIndexLongerAndIndexShorter, newAccumulator ) } else (currentLongestLength, accumulator) val (length, indexShorters) = recursive() if (indexShorters.nonEmpty) Some( indexShorters .map { indexShorter => shorter.substring(indexShorter, indexShorter + length) } .toSet ) else None } else None可以对CPU和内存进行深度优化。

在保持功能和不变的同时消除了所有的低效率，执行速度比longestCommonSubstrings提高了几倍。通过使用一对仅跟踪当前行和先前行的longestCommonSubstrings替换整个矩阵的Map，可以实现大部分成本降低。

要轻松查看与List，please view this visual diff的区别。

longestCommonSubstrings

longestCommonSubstringsWithIndexes

通过重构def longestCommonSubstringsFast(left: String, right: String): Option[Set[String]] = if (left.nonEmpty && right.nonEmpty) { val (shorter, longer) = if (left.length < right.length) (left, right) else (right, left) @scala.annotation.tailrec def recursive( indexLonger: Int = 0, indexShorter: Int = 0, currentLongestLength: Int = 0, lengthsPrior: List[Int] = List.fill(shorter.length)(0), lengths: List[Int] = Nil, accumulator: List[Int] = Nil ): (Int, List[Int]) = if (indexLonger < longer.length) { val length = if (longer(indexLonger) != shorter(indexShorter)) 0 else lengthsPrior.head + 1 val newCurrentLongestLength = if (length > currentLongestLength) length else currentLongestLength val newAccumulator = if ((length < currentLongestLength) || (length == 0)) accumulator else { val entry = indexShorter - length + 1 if (length > currentLongestLength) List(entry) else entry :: accumulator } if (indexShorter < shorter.length - 1) recursive( indexLonger, indexShorter + 1, newCurrentLongestLength, lengthsPrior.tail, length :: lengths, newAccumulator ) else recursive( indexLonger + 1, 0, newCurrentLongestLength, 0 :: lengths.reverse, Nil, newAccumulator ) } else (currentLongestLength, accumulator) val (length, indexShorters) = recursive() if (indexShorters.nonEmpty) Some( indexShorters .map { indexShorter => shorter.substring(indexShorter, indexShorter + length) } .toSet ) else None } else None来增强返回值，方法是将每个条目扩展为longestCommonSubstringsFast的元组，该元组既包括找到的子字符串，又包括每个输入(String, (Int, Int))内的子字符串所在的索引找到。

注意：如果同一对String出现多次，则会创建索引对的组合扩展。

再次对功能样式进行了调整，通过将String包装在List中，返回值也得到了增强，从不返回空的List。如果返回的值为Option，则包含的Some将始终包含至少一项。

List

firstLongestCommonSubstring

注重效率的def longestCommonSubstringsWithIndexes(left: String, right: String): Option[List[(String, (Int, Int))]] = if (left.nonEmpty && right.nonEmpty) { val isLeftShorter = left.length < right.length val (shorter, longer) = if (isLeftShorter) (left, right) else (right, left) @scala.annotation.tailrec def recursive( indexLonger: Int = 0, indexShorter: Int = 0, currentLongestLength: Int = 0, lengthsPrior: List[Int] = List.fill(shorter.length)(0), lengths: List[Int] = Nil, accumulator: List[(Int, Int)] = Nil ): (Int, List[(Int, Int)]) = if (indexLonger < longer.length) { val length = if (longer(indexLonger) != shorter(indexShorter)) 0 else lengthsPrior.head + 1 val newCurrentLongestLength = if (length > currentLongestLength) length else currentLongestLength val newAccumulator = if ((length < currentLongestLength) || (length == 0)) accumulator else { val entry = (indexLonger - length + 1, indexShorter - length + 1) if (length > currentLongestLength) List(entry) else entry :: accumulator } if (indexShorter < shorter.length - 1) recursive( indexLonger, indexShorter + 1, newCurrentLongestLength, lengthsPrior.tail, length :: lengths, newAccumulator ) else recursive( indexLonger + 1, 0, newCurrentLongestLength, 0 :: lengths.reverse, Nil, newAccumulator ) } else (currentLongestLength, accumulator) val (length, indexPairs) = recursive() if (indexPairs.nonEmpty) Some( indexPairs .reverse .map { indexPair => ( longer.substring(indexPair._1, indexPair._1 + length), if (isLeftShorter) indexPair.swap else indexPair ) } ) else None } else None版本提供了一个机会，可以在仅关心第一个LCS并希望忽略其他相同大小的LCS时尽早终止。

longestCommonSubstringsFast

奖金：

longestCommonSubstringsUltimate

重构def firstLongestCommonSubstring(left: String, right: String): Option[(String, (Int, Int))] = if (left.nonEmpty && right.nonEmpty) { val isLeftShorter = left.length < right.length val (shorter, longer) = if (isLeftShorter) (left, right) else (right, left) @scala.annotation.tailrec def recursive( indexLonger: Int = 0, indexShorter: Int = 0, currentLongestLength: Int = 0, lengthsPrior: List[Int] = List.fill(shorter.length)(0), lengths: List[Int] = Nil, accumulator: Option[(Int, Int)] = None ): Option[(Int, (Int, Int))] = if (indexLonger < longer.length) { val length = if (longer(indexLonger) != shorter(indexShorter)) 0 else lengthsPrior.head + 1 val newAccumulator = if (length > currentLongestLength) Some((indexLonger - length + 1, indexShorter - length + 1)) else accumulator if (length < shorter.length) { val newCurrentLongestLength = if (length > currentLongestLength) length else currentLongestLength if (indexShorter < shorter.length - 1) recursive( indexLonger, indexShorter + 1, newCurrentLongestLength, lengthsPrior.tail, length :: lengths, newAccumulator ) else recursive( indexLonger + 1, 0, newCurrentLongestLength, 0 :: lengths.reverse, Nil, newAccumulator ) } else recursive(longer.length, 0, length, lengthsPrior, lengths, newAccumulator) //early terminate } else accumulator.map((currentLongestLength, _)) recursive().map { case (length, indexPair) => ( longer.substring(indexPair._1, indexPair._1 + length), if (isLeftShorter) indexPair.swap else indexPair ) } } else None，以增加内部实现的可变性，同时从外部保留函数的引用透明性。

进一步消除所有低效率的问题，同时保持功能性和参照透明性（参与利用实现本身内部的可变性（有些人认为不能有效运行）），执行速度几乎是{的三倍{1}}。大部分成本削减是通过将一对longestCommonSubstringsFast替换为一个longestCommonSubstringsFast。

要轻松查看与List，please view this visual diff的区别。

Array

Answer 7

这种方法怎么样：

获取所有子字符串：

left.inits.flatMap（_。tails）

根据长度反向排列

.toList.sortBy（_。length）.reverse

找到第一场比赛

.find（right.contains（_））。get

全功能：

  def lcs(left: String, right: String) = {
    left.inits.flatMap(_.tails)
      .toList.sortBy(_.length).reverse
      .find(right.contains(_)).get
  }

注意： get永远不会为空，因为初始字符串排列还包含空字符串，该空字符串将始终匹配某些内容。

在scala中查找两个字符串之间的最长公共子字符串的功能方法

7 个答案: