计算列表中的单词

时间:2015-02-27 11:27:31

标签: scala

我有一个字符串列表:

 val file: List[String] = List("a1,a,b", "a1,c,b", "a2,a,d,e")

我尝试计算与排除csv String中第一个元素相关联的String中每个单词出现的次数。

所以上面的List应转换为:

List("a1,1,2,1,0,0", "a2,1,0,0,1,1")    

as for "a1" a occurs once, b occurs twice , c occurs once , d occurs 0 times, e occurs 0 times
for "a2" a occurs once, b 0 times , c 0 times , d occurs once, e occurs once

这是我到目前为止所做的:

 def getTail[T](l : List[T]) = l match {
        case h::t => t
    }                                             //> getTail: [T](l: List[T])List[T]

    def getHead[T](l : List[T]) = l match {
        case h::t => h
    }                                             //> getHead: [T](l: List[T])T

    val arr = file.map(m => (getHead(m.split(",").toList) , getTail(m.split(",").toList)))
                                                  //> arr  : List[(String, List[String])] = List((a1,List(a, b)), (a1,List(c, b)),
                                                  //|  (a2,List(a, d, e)))

    val g = arr.groupBy(_._1)                     //> g  : scala.collection.immutable.Map[String,List[(String, List[String])]] = M
                                                  //| ap(a1 -> List((a1,List(a, b)), (a1,List(c, b))), a2 -> List((a2,List(a, d, e
                                                  //| ))))
    val keysRemoved = g.mapValues(v => v.map (v2 => v2._2).flatten)
                                                  //> keysRemoved  : scala.collection.immutable.Map[String,List[String]] = Map(a1 
                                                  //| -> List(a, b, c, b), a2 -> List(a, d, e))
    val associateOne = keysRemoved.mapValues(v => v.map(m => (m , 1)))

val counted = keysRemoved.mapValues(v => v.map(m => (m , 1)))
                                              //> counted  : scala.collection.immutable.Map[String,List[(String, Int)]] = Map(
                                              //| a1 -> List((a,1), (b,1), (c,1), (b,1)), a2 -> List((a,1), (d,1), (e,1)))

但我不确定如何计算List的每个元素,以及如何计算List中未包含的单词的返回0?

2 个答案:

答案 0 :(得分:1)

val file: List[String] = List("a1,a,b", "a1,c,b", "a2,a,d,e")
val splitCsvs = file.map { line =>
  val split = line.split(",")   // split up the csv line
  split.head -> split.tail      // separate the key from the words
}
val collapsed = splitCsvs.groupBy(_._1).mapValues(_.flatMap(_._2)) // group by key
val allWords = collapsed.flatMap(_._2).toVector.distinct.sorted // get all unique words

val result = collapsed.map {
  case (head, tail) =>
    val counts = tail.groupBy(identity).mapValues(_.size).withDefaultValue(0) // count
    (head +: allWords.map(counts)).mkString(",") // make counts string, with the key
}

result.foreach(println)

打印:

a1,1,2,1,0,0
a2,1,0,0,1,1

答案 1 :(得分:1)

我不会假装它是最简单的解决方案,但编码很有趣。好问题。

  val file: List[String] = List("a1,a,b", "a1,c,b", "a2,a,d,e")
                                                  //> file  : List[String] = List(a1,a,b, a1,c,b, a2,a,d,e)
  val xs = file.map(_.split(',').toList)          //> xs  : List[List[String]] = List(List(a1, a, b), List(a1, c, b), List(a2, a, 
                                                  //| d, e))
  val (hs, ts) = xs.unzip { case(h::t) => (h, t) }//> hs  : List[String] = List(a1, a1, a2)
                                                  //| ts  : List[List[String]] = List(List(a, b), List(c, b), List(a, d, e))
  val ks = hs.distinct                            //> ks  : List[String] = List(a1, a2)
  val vs = ts.flatten.distinct                    //> vs  : List[String] = List(a, b, c, d, e)

  val matrix =
    xs.map { case(h::t) => h -> t.map(_ -> 1) }
      .groupBy(_._1)
      .mapValues(_.flatMap(_._2)
                  .groupBy(_._1)
                  .mapValues(_.map(_._2)
                              .reduce(_ + _))
                  .withDefaultValue(0))           //> matrix  : scala.collection.immutable.Map[String,scala.collection.immutable.M
                                                  //| ap[String,Int]] = Map(a1 -> Map(b -> 2, a -> 1, c -> 1), a2 -> Map(e -> 1, d
                                                  //|  -> 1, a -> 1))
  ks.map { k => (k::vs.map(matrix(k))).mkString(",") }
                                                  //> res0: List[String] = List(a1,1,2,1,0,0, a2,1,0,0,1,1)