Scala / Spark:如何将此参数传递给.select语句

时间:2019-01-10 11:52:02

标签: scala apache-spark

我有一种方法来获取有效的数据帧的子集:

This works
val subset_cols = {joinCols :+ col}
val df1_subset = df1.select(subset_cols.head, subset_cols.tail: _*)

这不起作用:(代码可以编译,但是出现运行时错误)

val subset_cols = {joinCols :+ col}
val df1_subset = df1.select(subset_cols.deep.mkString(","))

错误:

Exception in thread "main" org.apache.spark.sql.AnalysisException: 
cannot resolve '`first_name,last_name,rank_dr`' given input columns: 
[model, first_name, service_date, rank_dr, id, purchase_date, 
dealer_id, purchase_price, age, loyalty_score, vin_num, last_name, color];;

'Project ['first_name,last_name,rank_dr]

我正在尝试将subset_cols传递给.select方法,但似乎我缺少某种格式。 有人可以帮忙吗?

Thx

1 个答案:

答案 0 :(得分:1)

您要做的是:

df1.select("first_name,last_name,rank_dr")

通过火花查找不存在的名为"first_name,last_name,rank_dr"的列

尝试:

val df1_subset = df1.selectExpr(subset_cols: _*)