Question

我有一个包含以下架构的数据框： -

scala> final_df.printSchema
root
 |-- mstr_prov_id: string (nullable = true)
 |-- prov_ctgry_cd: string (nullable = true)
 |-- prov_orgnl_efctv_dt: timestamp (nullable = true)
 |-- prov_trmntn_dt: timestamp (nullable = true)
 |-- prov_trmntn_rsn_cd: string (nullable = true)
 |-- npi_rqrd_ind: string (nullable = true)
 |-- prov_stts_aray_txt: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- PROV_STTS_KEY: string (nullable = true)
 |    |    |-- PROV_STTS_EFCTV_DT: timestamp (nullable = true)
 |    |    |-- PROV_STTS_CD: string (nullable = true)
 |    |    |-- PROV_STTS_TRMNTN_DT: timestamp (nullable = true)
 |    |    |-- PROV_STTS_TRMNTN_RSN_CD: string (nullable = true)

我正在运行以下代码来进行基本清理，但它不在“prov_stts_aray_txt”内部工作，基本上它不会进入数组类型并执行转换期望。我想遍历嵌套的所有字段（Dataframe中的Flat和嵌套字段，并执行基本转换。

for(dt <- final_df.dtypes){
  final_df = final_df.withColumn(dt._1,when(upper(trim(col(dt._1))) === "NULL",lit(" ")).otherwise(col(dt._1)))
}

请帮忙。

由于

通过spark中的嵌套元素进行迭代

0 个答案: