如何懒洋洋地清理和修剪两根弦?

时间:2017-06-24 17:55:11

标签: string scala lazy-evaluation

这是我之前question的后续内容。

假设我需要从两个输入字符串s1s2中删除某些字符,然后返回其子字符串t1t2,如下所示:

  • t1t2已“清理”
  • t1t2的长度相同
  • t1t2长度最多为k
  • t1t2尽可能长

我可以编写一个次优的实现来扫描整个输入,如下所示:

def cleanTrim(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  val cleaned1 = s1.filterNot(chars)
  val cleaned2 = s2.filterNot(chars)
  val k1 = math.min(cleaned1.length, k)
  val k2 = math.min(cleaned2.length, k)
  val n = math.min(k1, k2)
  val t1 = cleaned1.substring(0, n)
  val t2 = cleaned2.substring(0, n)
  (t1, t2)
}

您如何建议懒惰地写一下(例如Stream)?

2 个答案:

答案 0 :(得分:2)

懒洋洋地执行此操作的关键是,您可以zip两个过滤后的流/迭代器/视图同时遍历它们并切割较长的一个以使其具有与较短的相同的大小。

我已经完成了这种方法的几种实现,以比较功能和命令式实现的性能。以下是方法的代码:

def cleanTrim_Streams(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  def stream(s: String) = s.toStream.filterNot(chars)
  val (stream1, stream2) = stream(s1).zip(stream(s2)).take(k).unzip
  (stream1.mkString, stream2.mkString)
}

def cleanTrim_IteratorsFold(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  def iter(s: String) = s.iterator.filterNot(chars)
  iter(s1).zip(iter(s2)).take(k).foldLeft(("", "")) {
    case ((r1, r2), (c1, c2)) => (r1 + c1, r2 + c2)
  }
}

def cleanTrim_Views(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  def view(s: String) = s.view.filterNot(chars)
  val (v1, v2) = view(s1).zip(view(s2)).take(k).unzip
  (v1.mkString, v2.mkString)
}

def cleanTrim_FullTraverse(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  val cleaned1 = s1.filterNot(chars)
  val cleaned2 = s2.filterNot(chars)
  val k1 = math.min(cleaned1.length, k)
  val k2 = math.min(cleaned2.length, k)
  val n = math.min(k1, k2)
  val t1 = cleaned1.substring(0, n)
  val t2 = cleaned2.substring(0, n)
  (t1, t2)
}

def cleanTrim_IteratorsImperative(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  def iter(s: String) = s.iterator.filterNot(chars)

  val b1 = new StringBuilder
  val b2 = new StringBuilder
  for ((c1, c2) <- iter(s1).zip(iter(s2)).take(k)) {
    b1 += c1
    b2 += c2
  }

  (b1.result(), b2.result())
}

def cleanTrim_Imperative(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
  var i1 = 0
  var i2 = 0

  val b1 = new StringBuilder
  val b2 = new StringBuilder

  while (b1.size < k && b2.size < k) {

    while (i1 < s1.length && chars.contains(s1(i1))) i1 += 1
    while (i2 < s2.length && chars.contains(s2(i2))) i2 += 1

    if (i1 >= s1.length || i2 >= s2.length) return (b1.result(), b2.result())

    b1 += s1(i1); i1 += 1
    b2 += s2(i2); i2 += 1
  }

  (b1.result(), b2.result())
}

以下是基准测试的结果s1.size = 100,s2.size = 200,chars.size = 3,修剪尺寸= 86且k = 50或100。

[info] Benchmark                       (maxLength)  Mode  Cnt    Score    Error  Units
[info] Benchmarks.fullTraverse                  50  avgt   10    5,591 ±  2,586  us/op
[info] Benchmarks.fullTraverse                 100  avgt   10    5,678 ±  2,799  us/op
[info] Benchmarks.imperative                    50  avgt   10    1,091 ±  0,066  us/op
[info] Benchmarks.imperative                   100  avgt   10    2,384 ±  0,931  us/op
[info] Benchmarks.iteratorsFold                 50  avgt   10    4,164 ±  0,214  us/op
[info] Benchmarks.iteratorsFold                100  avgt   10   11,783 ±  8,251  us/op
[info] Benchmarks.iteratorsImperative           50  avgt   10    4,104 ±  1,241  us/op
[info] Benchmarks.iteratorsImperative          100  avgt   10    9,695 ±  5,554  us/op
[info] Benchmarks.streams                       50  avgt   10   38,670 ±  3,547  us/op
[info] Benchmarks.streams                      100  avgt   10  116,573 ± 72,291  us/op
[info] Benchmarks.views                         50  avgt   10   17,209 ± 30,554  us/op
[info] Benchmarks.views                        100  avgt   10   17,124 ±  0,818  us/op

这些结果中的一些观点:

  • 当然,没有什么能够击败直截了当的强制性实施。
  • fullTraverse代码(取自您的问题)实际上在测试数据大小时非常有效。也许最好使用它。
  • 在懒惰的功能实现iteratorsFold中表现最佳,并击败了fullTraverse
  • 懒洋洋地执行这么简单的任务的开销非常大。
  • Stream非常低效,正如预期的那样。

答案 1 :(得分:1)

您可以使用Streams过滤字符串,如下所示:

def filterCharsLazy(s: String, chars: Set[Char], k: Int): String = {
  val s2: Stream[Char] = s.toStream
  s2.filter(a => !chars(a)).take(k).mkString
}

有趣的是filterNot似乎不允许延迟执行,所以我用普通的filter替换了它。

经过测试:

def time[R](block: => R): R = {
  val t0 = System.nanoTime()
  val result = block    // call-by-name
  val t1 = System.nanoTime()
  println("Elapsed time: " + (t1 - t0) + "ns")
  result
}
val str = "asdfasdfasdfasdfasdf"*1000000
val chars = Set('a','s','b','c','e','g','h','j','k')

time { filterCharsLazy(str, chars, 10) }