我正在尝试在数据框选择语句的两列之间添加一个空列。
使用withColumn
函数,我只能将其作为结尾列追加,但是我需要在中间(第3列和第6列)的空白列,如下所示。
val product1 = product.select("_c1","_c2"," ","_c4", "_c5", "_c5", " ", "c6")
我尝试在withColumn
语句的中间使用select
,如下所示,它给出了错误:
val product1 = product.select("_c1","_c2",product.withColumn("NewCol",lit(None).cast("string")),"_c4", "_c5", "_c5", " ", "c6")
>error: overloaded method value select with alternatives:
(col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
(cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
cannot be applied to (String, String, String, String, String, String, String, String, org.apache.spark.sql.DataFrame, String)
如有任何建议,请通知我。谢谢
答案 0 :(得分:1)
要选择数据框中的列,可以使用字符串(列名)或列(Column
类型)作为输入。来自documentation:
def select(col: String, cols: String*): DataFrame Selects a set of columns.
def select(cols: Column*): DataFrame Selects a set of column based expressions.
但是,这些不能混合。在这种情况下,请使用select
类型的Column
。要获取特定名称的列,请使用col
函数或$
(在importing spark implicits之后)。
val spark = SparkSession()....
import spark.implicits._
val product1 = product.select($"_c1", $"_c2", lit(" ").as("newCol1"), $"_c4", $"_c5", $"_c5", lit(" ").as("newCol2"), $"c6")