我刚开始用scala编程。我也使用Apache spark来读取文件 - moviesFile 。在下面的代码中,我更新了foreach函数中的可变映射。地图在foreach函数中更新。但是一旦foreach退出,这些价值就不存在了。
如何使值在地图变量 movieMap 中保持永久性。
val movieMap = scala.collection.mutable.Map[String,String]()
val movie = moviesFile.map(_.split("::")).foreach {
x => x.mkString(" ")
val movieid = x(0)
val title = x(1)
val genre = x(2)
val value = title+","+genre
movieMap(movieid.toString()) = value.toString()
println(movieMap.keySet)
}
println(movieMap.keySet)
println(movieMap.get("29"))
答案 0 :(得分:3)
我相信你以非常错误的方式使用Spark。如果您想使用Spark,则必须使用Spark的分布式数据结构。
我建议继续使用Spark的分布式和并行化数据结构(RDD's
)。包含( key, value ) pairs
的RDD隐式提供了一些类似Map的功能。
Import org.apache.spark.SparkContext._
// Assume sc is the SparkContext instance
val moviesFileRdd = sc.textFile("movies.txt")
// moviesRdd is RDD[ ( String, String ) ] which acts as a Map-like thing of ( key, value ) pairs
val moviesRdd = moviesFileRdd.map( line =>
val splitLine = line.split( "::" )
val movieId = splitLine(0)
val title = splitLine(1)
val genre = splitLine(2)
val value = title + ", " + genre
( movieId.toString, value.toString )
)
// You see... RDD[ ( String, String ) ] offers some map-like things.
// get a list of all values with key 29
val listOfValuesWithKey29 = moviesRdd.lookup( "29" )
// I don't know why ? but if you really need a map here then
val moviesMap = moviesRdd.collectAsMap
// moviesMap will be a immutable Map, in case you need a mutable Map,
val moviesMutableMap = mutable.Map( moviesMap.toList: _* )