我有一个包含单词的源文件,想要进行典型的单词计数,我使用的东西转换为数组并进入内存
def freqMap(lines: Iterator[String]): Map[String, Int] = {
val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))
val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }
frequencies
}
但我想逐行评估并在处理每一行时显示输出。怎么能懒得做,而不把所有东西都放进记忆中
答案 0 :(得分:1)
你说你不想把所有内容都放在内存中,但你想“在处理每一行时显示输出”。听起来你只想println
中间结果。
lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
println(mp) // output intermediate results
line.split(" ").foldLeft(mp){ case (m,word) =>
m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
}
}
迭代器(lines
)一次只能使用一个。 Map
结果是逐字构建的,并作为foldLeft
累加器逐行传送。
答案 1 :(得分:0)
我认为您正在寻找的是scanLeft方法。所以示例解决方案可能如下所示:
val iter = List("this is line number one", "this is line number two", "this this this").toIterator
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
如果你执行val solution = iter.flatMap(_。split("")),它都是懒惰和基于拉的.scanLeftMap [String,Int] { case(acc,word)=> 的println(字) acc.updated(word,acc.getOrElse(word,0)+ 1) }
println(solution.take(3).toList)
这将打印到控制台:
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))