Scala中的单词计数示例?

时间:2012-12-29 13:27:09

标签: string scala associative-array

我看到很多Scala教程,其中包含一些示例,例如招聘遍历或解决数学问题。在我的日常编程生活中,我感觉我的大部分编码时间都花在了字符串操作,数据库查询和日期操作等普通任务上。有兴趣提供以下perl脚本的Scala版本示例吗?

#!/usr/bin/perl
use strict;
#opens a file with on each line one word and counts the number of occurrences 
# of each word, case insensitive
print "Enter the name of your file, ie myfile.txt:\n";
my $val = <STDIN>;
chomp ($val);
open (HNDL, "$val") || die "wrong filename";

my %count = ();
while ($val = <HNDL>)
{
        chomp($val);
    $count{lc $val}++;
}
close (HNDL);

print "Number of instances found of:\n";
foreach my $word (sort keys %count) {
        print "$word\t: " . $count{$word} . " \n";
}

总结:

  • 要求提供文件名
  • 读取文件(每行包含1个字)
  • 取消行尾(cr,lf或crlf)
  • 小写单词
  • 增加字数
  • 打印出每个单词,按字母顺序排序,及其计数

TIA

3 个答案:

答案 0 :(得分:10)

这样的简单字数可以写成如下:

import io.Source
import java.io.FileNotFoundException

object WC {

  def main(args: Array[String]) {
    println("Enter the name of your file, ie myfile.txt:")
    val fileName = readLine

    val words = try {
      Source.fromFile(fileName).getLines.toSeq.map(_.toLowerCase.trim)
    } catch {
      case e: FileNotFoundException =>
        sys.error("No file named %s found".format(fileName))
    }

    val counts = words.groupBy(identity).mapValues(_.size)

    println("Number of instances found of:")
    for((word, count) <- counts) println("%s\t%d".format(word, count))

  }

}

答案 1 :(得分:3)

如果你想要简洁/紧凑,你可以在2.10:

// Opens a file with one word on each line and counts
// the number of occurrences of each word (case-insensitive)
object WordCount extends App {
  println("Enter the name of your file, e.g. myfile.txt: ")
  val lines = util.Try{ io.Source.fromFile(readLine).getLines().toSeq } getOrElse
    { sys.error("Wrong filename.") }
  println("Number of instances found of:")
  lines.map(_.trim.toLowerCase).toSeq.groupBy(identity).toSeq.
    map{ case (w,ws) => s"$w\t: ${ws.size}" }.sorted.foreach(println)
}

答案 2 :(得分:1)

  val lines : List[String] = List("this is line one" , "this is line 2", "this is line three")

  val linesConcat : String = lines.foldRight("")( (a , b) => a + " "+ b)

  linesConcat.split(" ").groupBy(identity).toList.foreach(p => println(p._1+","+p._2.size))

打印:

this,3
is,3
three,1
line,3
2,1
one,1