我有一个字符串列表:
val file: List[String] = List("a1,a,b", "a1,c,b", "a2,a,d,e")
我尝试计算与排除csv String中第一个元素相关联的String中每个单词出现的次数。
所以上面的List应转换为:
List("a1,1,2,1,0,0", "a2,1,0,0,1,1")
as for "a1" a occurs once, b occurs twice , c occurs once , d occurs 0 times, e occurs 0 times
for "a2" a occurs once, b 0 times , c 0 times , d occurs once, e occurs once
这是我到目前为止所做的:
def getTail[T](l : List[T]) = l match {
case h::t => t
} //> getTail: [T](l: List[T])List[T]
def getHead[T](l : List[T]) = l match {
case h::t => h
} //> getHead: [T](l: List[T])T
val arr = file.map(m => (getHead(m.split(",").toList) , getTail(m.split(",").toList)))
//> arr : List[(String, List[String])] = List((a1,List(a, b)), (a1,List(c, b)),
//| (a2,List(a, d, e)))
val g = arr.groupBy(_._1) //> g : scala.collection.immutable.Map[String,List[(String, List[String])]] = M
//| ap(a1 -> List((a1,List(a, b)), (a1,List(c, b))), a2 -> List((a2,List(a, d, e
//| ))))
val keysRemoved = g.mapValues(v => v.map (v2 => v2._2).flatten)
//> keysRemoved : scala.collection.immutable.Map[String,List[String]] = Map(a1
//| -> List(a, b, c, b), a2 -> List(a, d, e))
val associateOne = keysRemoved.mapValues(v => v.map(m => (m , 1)))
val counted = keysRemoved.mapValues(v => v.map(m => (m , 1)))
//> counted : scala.collection.immutable.Map[String,List[(String, Int)]] = Map(
//| a1 -> List((a,1), (b,1), (c,1), (b,1)), a2 -> List((a,1), (d,1), (e,1)))
但我不确定如何计算List的每个元素,以及如何计算List中未包含的单词的返回0?
答案 0 :(得分:1)
val file: List[String] = List("a1,a,b", "a1,c,b", "a2,a,d,e")
val splitCsvs = file.map { line =>
val split = line.split(",") // split up the csv line
split.head -> split.tail // separate the key from the words
}
val collapsed = splitCsvs.groupBy(_._1).mapValues(_.flatMap(_._2)) // group by key
val allWords = collapsed.flatMap(_._2).toVector.distinct.sorted // get all unique words
val result = collapsed.map {
case (head, tail) =>
val counts = tail.groupBy(identity).mapValues(_.size).withDefaultValue(0) // count
(head +: allWords.map(counts)).mkString(",") // make counts string, with the key
}
result.foreach(println)
打印:
a1,1,2,1,0,0
a2,1,0,0,1,1
答案 1 :(得分:1)
我不会假装它是最简单的解决方案,但编码很有趣。好问题。
val file: List[String] = List("a1,a,b", "a1,c,b", "a2,a,d,e")
//> file : List[String] = List(a1,a,b, a1,c,b, a2,a,d,e)
val xs = file.map(_.split(',').toList) //> xs : List[List[String]] = List(List(a1, a, b), List(a1, c, b), List(a2, a,
//| d, e))
val (hs, ts) = xs.unzip { case(h::t) => (h, t) }//> hs : List[String] = List(a1, a1, a2)
//| ts : List[List[String]] = List(List(a, b), List(c, b), List(a, d, e))
val ks = hs.distinct //> ks : List[String] = List(a1, a2)
val vs = ts.flatten.distinct //> vs : List[String] = List(a, b, c, d, e)
val matrix =
xs.map { case(h::t) => h -> t.map(_ -> 1) }
.groupBy(_._1)
.mapValues(_.flatMap(_._2)
.groupBy(_._1)
.mapValues(_.map(_._2)
.reduce(_ + _))
.withDefaultValue(0)) //> matrix : scala.collection.immutable.Map[String,scala.collection.immutable.M
//| ap[String,Int]] = Map(a1 -> Map(b -> 2, a -> 1, c -> 1), a2 -> Map(e -> 1, d
//| -> 1, a -> 1))
ks.map { k => (k::vs.map(matrix(k))).mkString(",") }
//> res0: List[String] = List(a1,1,2,1,0,0, a2,1,0,0,1,1)