Scala Spark不会在循环外返回值

时间:2016-12-18 19:45:25

标签: scala apache-spark

我正在使用Apache Sparkmovie lens数据集中提取前10个电影和流派。我能够提取收视率,但每部电影的类型都采用这种格式Action|War|Crime。我使用|拆分字符串,然后尝试将其插入ArrayBufferListBuffer。它在循环内工作正常但是当我试图在循环外提取这些时我得到空结果。

    val rating_data = spark.read.format("csv").option("header", "true").load("data/ratings.csv")

//read movie name file
val movie_name = spark.read.format("csv").option("header", "true").load("data/movies.csv")

//query for movie with rating 5, matching movie id in both csv files and extract the top 10 movies with highest count of 5
 val movie_id = rating_data.filter(rating_data("rating").===(5)).groupBy("movieId").count().orderBy(org.apache.spark.sql.functions.col("count").desc).take(10)

var genre_list = Array[String]()
 movie_id.foreach(a => {
 #match the movies id in ratings and movie file
 val mv = movie_name.filter(movie_name("movieId").===(a(0)))

 mv.foreach(b => { 

 genre_list = b(2).toString.split('|').map(_.trim)
    genre_list.foreach(g => {
    mylist += g
    #mylist not empty , has elements 

      })
})
})
    println(mylist)  #empty
}

0 个答案:

没有答案