我正在尝试运行测试spark / scala代码,以使用以下spark dataframe使用测试数据来查找薪水高于平均工资的员工。但这在执行时失败了:
线程“ main”中的异常java.lang.UnsupportedOperationException:无法评估表达式:avg(input [4,double,false])
达到此目的的正确语法是什么?
val dataDF20 = spark.createDataFrame(Seq(
(11, "emp1", 2, 45, 1000.0),
(12, "emp2", 1, 34, 2000.0),
(13, "emp3", 1, 33, 3245.0),
(14, "emp4", 1, 54, 4356.0),
(15, "emp5", 2, 76, 56789.0)
)).toDF("empid", "name", "deptid", "age", "sal")
val condition1 : Column = col("sal") > avg(col("sal"))
val d0 = dataDF20.filter(condition1)
println("------ d0.show()----", d0.show())
答案 0 :(得分:1)
您可以通过两个步骤完成此操作:
val avgVal = dataDF20.select(avg($"sal")).take(1)(0)(0)
dataDF20.filter($"sal" > avgVal).show()
+-----+----+------+---+-------+
|empid|name|deptid|age| sal|
+-----+----+------+---+-------+
| 15|emp5| 2| 76|56789.0|
+-----+----+------+---+-------+