我只想知道如何在spark SQL中执行以下MySQL查询。
SELECT first_name,last_name, job_id, salary
FROM employees
WHERE salary >
ALL (SELECT salary FROM employees WHERE job_id = 'SH_CLERK')
ORDER BY salary
特别是ALL()
函数。
答案 0 :(得分:0)
ALL当前不适用于SPARK SQL。许多SQL都没有使用它,并且可以将其转换为其他方法。
在这种情况下,MAX将达到以下相同的结果:
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._
val df = sc.parallelize(Seq(
("John", "sh_clerk", 20000), ("Peter", "sh_clerk", 60000), ("Sally", "manager", 50000),
("Cabe", "programmer", 100000), ("Bob", "accountant", 65000)
)).toDF("first_name", "job_type", "salary")
df.createOrReplaceTempView("employees")
val res = spark.sql(""" SELECT first_name, job_type, salary
FROM employees
WHERE salary >
(SELECT MAX(salary) FROM employees WHERE job_type = 'sh_clerk')
ORDER BY salary""")
res.show(false)
返回:
+----------+----------+------+
|first_name|job_type |salary|
+----------+----------+------+
|Bob |accountant|65000 |
|Cabe |programmer|100000|
+----------+----------+------+