在scala中,每个映射值都为空

时间:2015-03-26 08:32:49

标签: scala apache-spark

我刚开始用scala编程。我也使用Apache spark来读取文件 - moviesFile 。在下面的代码中,我更新了foreach函数中的可变映射。地图在foreach函数中更新。但是一旦foreach退出,这些价值就不存在了。

如何使值在地图变量 movieMap 中保持永久性。

 val movieMap = scala.collection.mutable.Map[String,String]()
 val movie = moviesFile.map(_.split("::")).foreach { 
    x => x.mkString(" ")
    val movieid = x(0)
    val title = x(1)
    val genre = x(2)
    val value = title+","+genre
    movieMap(movieid.toString()) = value.toString()
    println(movieMap.keySet)
}
println(movieMap.keySet)
println(movieMap.get("29"))

1 个答案:

答案 0 :(得分:3)

我相信你以非常错误的方式使用Spark。如果您想使用Spark,则必须使用Spark的分布式数据结构。

我建议继续使用Spark的分布式和并行化数据结构(RDD's)。包含( key, value ) pairs的RDD隐式提供了一些类似Map的功能。

Import org.apache.spark.SparkContext._

// Assume sc is the SparkContext instance

val moviesFileRdd = sc.textFile("movies.txt")

// moviesRdd is RDD[ ( String, String ) ] which acts as a Map-like thing of ( key, value ) pairs
val moviesRdd = moviesFileRdd.map( line =>
  val splitLine = line.split( "::" )
  val movieId = splitLine(0)
  val title = splitLine(1)
  val genre = splitLine(2)
  val value = title + ", " + genre
  ( movieId.toString, value.toString )
)

// You see... RDD[ ( String, String ) ] offers some map-like things.
// get a list of all values with key 29
val listOfValuesWithKey29 = moviesRdd.lookup( "29" )

// I don't know why ? but if you really need a map here then
val moviesMap = moviesRdd.collectAsMap

// moviesMap will be a immutable Map, in case you need a mutable Map,
val moviesMutableMap = mutable.Map( moviesMap.toList: _* )