我有一种方法来获取有效的数据帧的子集:
This works
val subset_cols = {joinCols :+ col}
val df1_subset = df1.select(subset_cols.head, subset_cols.tail: _*)
这不起作用:(代码可以编译,但是出现运行时错误)
val subset_cols = {joinCols :+ col}
val df1_subset = df1.select(subset_cols.deep.mkString(","))
错误:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '`first_name,last_name,rank_dr`' given input columns:
[model, first_name, service_date, rank_dr, id, purchase_date,
dealer_id, purchase_price, age, loyalty_score, vin_num, last_name, color];;
'Project ['first_name,last_name,rank_dr]
我正在尝试将subset_cols传递给.select方法,但似乎我缺少某种格式。 有人可以帮忙吗?
Thx
答案 0 :(得分:1)
您要做的是:
df1.select("first_name,last_name,rank_dr")
通过火花查找不存在的名为"first_name,last_name,rank_dr"
的列
尝试:
val df1_subset = df1.selectExpr(subset_cols: _*)