Question

我是火花数据框的新手。我有一个文本文件，其数据类似

schoolid,classid,studentid,subject,marks
bjs,5,111,hindi,23
bjs,5,222,maths,78
bjs,7,333,bio,89
bjs,1,444,chemistry,67
ghs,2,555,bio,78
ghs,2,666,phy,56
ghs,9,777,drawing,56

我想将此数据转换为数据框，并在“标记”列下的每个值上加1

所以我正在使用的代码是

val df = sparkSession.read.format("csv").option("header","true").load("samplefile1.txt")
 val newdf = df.select(col($"marks"+1)).show()

但是我得到的错误是

type mismatch; found : org.apache.spark.sql.Column required: String

我可以使用正确的语法获取帮助吗

Answer 1

尝试以下解决方案：

filter

Answer 2

 df.withColumn("marks", expr("marks +1").cast("integer")).show

输出：

+--------+-------+---------+---------+-----+
|schoolid|classid|studentid|  subject|marks|
+--------+-------+---------+---------+-----+
|     bjs|      5|      111|    hindi|   24|
|     bjs|      5|      222|    maths|   79|
|     bjs|      7|      333|      bio|   90|
|     bjs|      1|      444|chemistry|   68|
|     ghs|      2|      555|      bio|   79|
|     ghs|      2|      666|      phy|   57|
|     ghs|      9|      777|  drawing|   57|
+--------+-------+---------+---------+-----+

Spark Scala数据框将1加到列中的所有值

2 个答案: