Spark数据框列名无法识别

时间:2018-05-14 02:46:43

标签: scala apache-spark

Spark数据帧df具有以下列名:

scala> df.columns
res6: Array[String] = Array(Age, Job, Marital, Education, Default, Balance,     
Housing, Loan, Contact, Day, Month, Duration, Campaign, pdays, previous,   
poutcome, Approved)

按列名对df进行sql查询工作正常:

scala> spark.sql(""" select Age from df limit 2 """).show()
+---+
|Age|
+---+
| 30|
| 33|
+---+

但是当我尝试在df上使用withColumn时遇到了问题:

scala> val dfTemp = df.withColumn("temp", df.Age.cast(DoubleType))
.drop("Age").withColumnRenamed("temp", "Age")
<console>:38: error: value Age is not a member of   
org.apache.spark.sql.DataFrame

以上代码取自here

谢谢

1 个答案:

答案 0 :(得分:0)

df.Age不是从数据框调用列的有效方式。正确的方法是

val dfTemp = df.withColumn("temp", df("Age").cast(DoubleType))

或者你可以做

val dfTemp = df.withColumn("temp", df.col("Age").cast(DoubleType))

或者你可以做

import org.apache.spark.sql.functions.col
val dfTemp = df.withColumn("temp", col("Age").cast(DoubleType))

注意:df.withColumn("temp", df.Age.cast(DoubleType())) pyspark中有效