通过在Scala中逐行获取文件输入来计算字数吗?

时间:2017-06-14 16:59:28

标签: scala functional-programming immutability

我有一个包含单词的源文件,想要进行典型的单词计数,我使用的东西转换为数组并进入内存

def freqMap(lines: Iterator[String]): Map[String, Int] = {

   val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))

   val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }

   frequencies
}

但我想逐行评估并在处理每一行时显示输出。怎么能懒得做,而不把所有东西都放进记忆中

2 个答案:

答案 0 :(得分:1)

你说你不想把所有内容都放在内存中,但你想“在处理每一行时显示输出”。听起来你只想println中间结果。

lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
  println(mp)  // output intermediate results
  line.split(" ").foldLeft(mp){ case (m,word) =>
      m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
  }
}

迭代器(lines)一次只能使用一个。 Map结果是逐字构建的,并作为foldLeft累加器逐行传送。

答案 1 :(得分:0)

我认为您正在寻找的是scanLeft方法。所以示例解决方案可能如下所示:

val iter = List("this is line number one", "this is line number two", "this this this").toIterator

  val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
    case (acc, word) =>
      println(word)
      acc.updated(word, acc.getOrElse(word, 0) + 1)
  }

如果你执行val solution = iter.flatMap(_。split("")),它都是懒惰和基于拉的.scanLeftMap [String,Int] {     case(acc,word)=>       的println(字)       acc.updated(word,acc.getOrElse(word,0)+ 1)   }

println(solution.take(3).toList)这将打印到控制台:

  val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
  println(word)
  acc.updated(word, acc.getOrElse(word, 0) + 1)

}

this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))