这是我之前question的后续内容。
假设我需要从两个输入字符串s1
和s2
中删除某些字符,然后返回其子字符串t1
和t2
,如下所示:
t1
和t2
已“清理”t1
和t2
的长度相同t1
和t2
长度最多为k
t1
和t2
尽可能长我可以编写一个次优的实现来扫描整个输入,如下所示:
def cleanTrim(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
val cleaned1 = s1.filterNot(chars)
val cleaned2 = s2.filterNot(chars)
val k1 = math.min(cleaned1.length, k)
val k2 = math.min(cleaned2.length, k)
val n = math.min(k1, k2)
val t1 = cleaned1.substring(0, n)
val t2 = cleaned2.substring(0, n)
(t1, t2)
}
您如何建议懒惰地写一下(例如Stream
)?
答案 0 :(得分:2)
懒洋洋地执行此操作的关键是,您可以zip
两个过滤后的流/迭代器/视图同时遍历它们并切割较长的一个以使其具有与较短的相同的大小。
我已经完成了这种方法的几种实现,以比较功能和命令式实现的性能。以下是方法的代码:
def cleanTrim_Streams(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
def stream(s: String) = s.toStream.filterNot(chars)
val (stream1, stream2) = stream(s1).zip(stream(s2)).take(k).unzip
(stream1.mkString, stream2.mkString)
}
def cleanTrim_IteratorsFold(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
def iter(s: String) = s.iterator.filterNot(chars)
iter(s1).zip(iter(s2)).take(k).foldLeft(("", "")) {
case ((r1, r2), (c1, c2)) => (r1 + c1, r2 + c2)
}
}
def cleanTrim_Views(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
def view(s: String) = s.view.filterNot(chars)
val (v1, v2) = view(s1).zip(view(s2)).take(k).unzip
(v1.mkString, v2.mkString)
}
def cleanTrim_FullTraverse(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
val cleaned1 = s1.filterNot(chars)
val cleaned2 = s2.filterNot(chars)
val k1 = math.min(cleaned1.length, k)
val k2 = math.min(cleaned2.length, k)
val n = math.min(k1, k2)
val t1 = cleaned1.substring(0, n)
val t2 = cleaned2.substring(0, n)
(t1, t2)
}
def cleanTrim_IteratorsImperative(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
def iter(s: String) = s.iterator.filterNot(chars)
val b1 = new StringBuilder
val b2 = new StringBuilder
for ((c1, c2) <- iter(s1).zip(iter(s2)).take(k)) {
b1 += c1
b2 += c2
}
(b1.result(), b2.result())
}
def cleanTrim_Imperative(s1: String, s2: String, chars: Set[Char], k: Int): (String, String) = {
var i1 = 0
var i2 = 0
val b1 = new StringBuilder
val b2 = new StringBuilder
while (b1.size < k && b2.size < k) {
while (i1 < s1.length && chars.contains(s1(i1))) i1 += 1
while (i2 < s2.length && chars.contains(s2(i2))) i2 += 1
if (i1 >= s1.length || i2 >= s2.length) return (b1.result(), b2.result())
b1 += s1(i1); i1 += 1
b2 += s2(i2); i2 += 1
}
(b1.result(), b2.result())
}
以下是基准测试的结果s1.size
= 100,s2.size
= 200,chars.size
= 3,修剪尺寸= 86且k
= 50或100。
[info] Benchmark (maxLength) Mode Cnt Score Error Units
[info] Benchmarks.fullTraverse 50 avgt 10 5,591 ± 2,586 us/op
[info] Benchmarks.fullTraverse 100 avgt 10 5,678 ± 2,799 us/op
[info] Benchmarks.imperative 50 avgt 10 1,091 ± 0,066 us/op
[info] Benchmarks.imperative 100 avgt 10 2,384 ± 0,931 us/op
[info] Benchmarks.iteratorsFold 50 avgt 10 4,164 ± 0,214 us/op
[info] Benchmarks.iteratorsFold 100 avgt 10 11,783 ± 8,251 us/op
[info] Benchmarks.iteratorsImperative 50 avgt 10 4,104 ± 1,241 us/op
[info] Benchmarks.iteratorsImperative 100 avgt 10 9,695 ± 5,554 us/op
[info] Benchmarks.streams 50 avgt 10 38,670 ± 3,547 us/op
[info] Benchmarks.streams 100 avgt 10 116,573 ± 72,291 us/op
[info] Benchmarks.views 50 avgt 10 17,209 ± 30,554 us/op
[info] Benchmarks.views 100 avgt 10 17,124 ± 0,818 us/op
这些结果中的一些观点:
fullTraverse
代码(取自您的问题)实际上在测试数据大小时非常有效。也许最好使用它。 iteratorsFold
中表现最佳,并击败了fullTraverse
。 Stream
非常低效,正如预期的那样。答案 1 :(得分:1)
您可以使用Streams过滤字符串,如下所示:
def filterCharsLazy(s: String, chars: Set[Char], k: Int): String = {
val s2: Stream[Char] = s.toStream
s2.filter(a => !chars(a)).take(k).mkString
}
有趣的是filterNot
似乎不允许延迟执行,所以我用普通的filter
替换了它。
经过测试:
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
val str = "asdfasdfasdfasdfasdf"*1000000
val chars = Set('a','s','b','c','e','g','h','j','k')
time { filterCharsLazy(str, chars, 10) }