我有以下简单的程序,我不知道如何在Scala中读取数组内的值。
val all_marks = Result.groupBy("class", "school").agg(collect_list("mark") as "marks",count("*") as "cnt").where($"cnt" > 10)
var mrk=all_marks.collect().map(mark=>""+mark(2))
结果如下所示:
mrk: Array[String] = Array(WrappedArray(52.0, 18.0, 17.0, 36.0, 22.0, 22.0), WrappedArray(49.0, 53.0, 41.0, 30.0, 48.0, 36.0))
我需要迭代(mrk)数组以分别读取每个WrappedArray,以便对每个WrappedArray中的每个标记进行进一步的数学计算。如何以简单的方式阅读每个WrappedArray。
答案 0 :(得分:0)
你需要用
替换var mrk = all_marks.collect()。map(mark =>"" + mark(2))val mrk=all.select("marks")
然后将数据帧转换为rdd(列表),然后再转换回dataframe
toRDD=mrk.rdd.map(_.getList[Int](0).toList).toDF("marks")
然后定义UDF
var i=0
var read_row_by_row=""
//define udf
val createUdf = udf((list: Seq[Int]) => {
val ascending = list.sorted //sorts in ascending order
//in this loop you can add whatever you like of calculations
for (i <- 0 to ascending.size - 1){
read_row_by_row=read_row_by_row+","+ascending(i)
}
s"${read_row_by_row}"
})
val g =ag_two.withColumn("mark", createUdf($"marks"))
g.show
+--------------------+
| marks|
+--------------------+
|,17,17,17,17,18,1...|
|,18,18,18,18,19,1...|
|,18,23,24,24,24,2...|
|,18,23,24,24,24,2...|
|,17,18,18,18,18,1...|
|,25,35,36,39,41,4...|
|,25,35,36,39,41,4...|
|,31,31,33,33,33,3...|