使用Spark如何在末尾添加列

时间:2018-11-11 06:28:00

标签: apache-spark apache-spark-sql

下面是我使用spark的数据集,我想在名称的末尾再添加一列。根据薪金水平决定

`sal >= 1000 && sal <=2000  = Level 1
sal > 2000 && sal <= 3000  = Level 2 
sal >3000 && sal <=4000 = Level 3

+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
|  100|  EMP1 |1000|IT  |
|  101|  EMP2 |2500|ITES|
|  102|  EMP3 |3000|BPO |
|  104|  EMP4 |4000|ENGG|
+-----+-------+----+----+`

输出

+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
|  100|  EMP1 |1000|IT  |Level 1|
|  101|  EMP2 |2500|ITES|Level 2|
|  102|  EMP3 |3000|BPO |Level 3|
|  104|  EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+

我写了下面的代码-

case class mySchema(empid: Int, empName: String, sal: Int, post: String) import spark.implicits._ val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post")) val df1 = rdd1.toDF() val dfTods = df1.as[mySchema] dfTods.createTempView("Employee") val resDS = spark.sql("""select *
case when (sal === 1000) then 'ASE' when (sal === 2000) then 'SE' else 'SSE'
end as level from Employee""")

线程“主要” org.apache.spark.sql.catalyst.parser.ParseException中的异常:     输入的“预期时间”不匹配(第2行,pos 70)

== SQL ==
select  *   case when (sal === 1000) then 'ASE' 
----------------------------------------------------------------------^^^
                                                                 when (sal === 2000) then 'SE' 
                                                                 else 'SSE'  
                                                                 end as  level  from Employee                        

1 个答案:

答案 0 :(得分:0)

select  
*,  
case   
when (sal >=1000 and sal <= 2000) then 'Level 1'  
when (sal > 2000 and sal <= 3000) then 'Level 2'  
when (sal > 3000 and sal <= 4000) then 'Level 3'
end   
as level  
from Employee