下面是我使用spark的数据集,我想在名称的末尾再添加一列。根据薪金水平决定
`sal >= 1000 && sal <=2000 = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3
+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`
输出
+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+
我写了下面的代码-
case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *
case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee""")
线程“主要” org.apache.spark.sql.catalyst.parser.ParseException中的异常: 输入的“预期时间”不匹配(第2行,pos 70)
== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee
答案 0 :(得分:0)
select
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee