我正尝试在数据框上应用以下数据透视
val pivot_company_model_vals_df = company_model_vals_df.groupBy("company_id","model_id","data_date")
.pivot("data_item_code")
.agg( when( col("data_item_value_numeric").isNotNull,
first("data_item_value_numeric")).otherwise(first("data_item_value_string")) )
错误
org.apache.spark.sql.AnalysisException: expression '`data_item_value_numeric`' is neither present in the group by, nor is it an aggregate function.
Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [company_id#123, model_id#142, data_date#161], [company_id#123, model_id#142, data_date#161,
CASE WHEN isnotnull(data_item_value_numeric#199) THEN cast(first(if ((data_item_code#180 <=> assetturnover)) data_item_value_numeric#199 else cast(null as double), true) as string)
ELSE first(if ((data_item_code#180 <=> assetturnover)) data_item_value_string#218 else cast(null as string), true) END AS assetturnover#320,
CASE WHEN isnotnull(data_item_value_numeric#199) THEN cast(first(if ((data_item_code#180 <=> focfdebt_percontr)) data_item_value_numeric#199 else cast(null as double), true) as string) ELSE first(if ((data_item_code#180 <=> focfdebt_percontr)) data_item_value_string#218 else cast(null as string), true) END AS focfdebt_percontr#374, CASE WHEN isnotnull(data_item_value_numeric#199) THEN cast(first(if ((data_item_code#180 <=> focfdebt_sensitivity)) data_item_value_numeric#199 else cast(null as double), true) as string) ELSE first(if ((data_item_code#180 <=> focfdebt_sensitivity)) data_item_value_string#218 else cast(null as string), true) END AS focfdebt_sensitivity#377,
CASE WHEN isnotnull(data_item_value_numeric#199) THEN cast(first(if ((data_item_code#180 <=> gearingratio1)) data_item_value_numeric#199 else cast(null as double), true) as string) ELSE first(if ((data_item_code#180 <=> gearingratio1)) data_item_value_string#218 else cast(null as string), true) END AS gearingratio1#380, ... 20 more fields]
+- AnalysisBarrier
您能帮我在这里做错什么吗? 谢谢
答案 0 :(得分:3)
问题已将first
像下面的.agg( first(when
一样移动:
val pivot_company_model_vals_df = company_model_vals_df.groupBy("company_id","model_id","data_date")
.pivot("data_item_code")
.agg( first(when( col("data_item_value_numeric").isNotNull,
col("data_item_value_numeric")).otherwise(col("data_item_value_string")) ) )