激发数据框架:寻找薪水高于组织平均薪水的员工

时间:2019-08-09 04:29:06

标签: scala apache-spark apache-spark-sql

我正在尝试运行测试spark / scala代码,以使用以下spark dataframe使用测试数据来查找薪水高于平均工资的员工。但这在执行时失败了:

  

线程“ main”中的异常java.lang.UnsupportedOperationException:无法评估表达式:avg(input [4,double,false])

达到此目的的正确语法是什么?

val dataDF20 = spark.createDataFrame(Seq(
      (11, "emp1",  2, 45, 1000.0),
      (12, "emp2", 1, 34, 2000.0),
      (13, "emp3", 1, 33, 3245.0),
      (14, "emp4", 1, 54, 4356.0),
      (15, "emp5", 2, 76, 56789.0)
    )).toDF("empid", "name", "deptid", "age", "sal")

    val condition1 : Column = col("sal") > avg(col("sal"))

    val d0 = dataDF20.filter(condition1)
    println("------ d0.show()----", d0.show())

1 个答案:

答案 0 :(得分:1)

您可以通过两个步骤完成此操作:

val avgVal = dataDF20.select(avg($"sal")).take(1)(0)(0)
dataDF20.filter($"sal" > avgVal).show()
+-----+----+------+---+-------+
|empid|name|deptid|age|    sal|
+-----+----+------+---+-------+
|   15|emp5|     2| 76|56789.0|
+-----+----+------+---+-------+