大家好,我是Spark / Scala的新手,我想重命名一些嵌套的JSON字段,因为在进行横向视图时,它会失败,因为有多个具有相同名称的JSON字段。
我想重命名EmployeeAddr和EmployeePhone中的EffDate和ExpDate列。
我已经尝试过withColumnRenamed和withColumn函数,但是由于某种原因,两者都对我不起作用。
Code to load into dataframe:
val Employee= spark.read.format(Employeefile_type).option("header", "true").option("inferSchema","true").load(file_loction)
root
|-- BirthDate: string (nullable = true)
|-- EmployeeId: string (nullable = true)
|-- EmployeeAddr: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- AddrTypeName: string (nullable = true)
| | |-- City: string (nullable = true)
| | |-- CtryCode: string (nullable = true)
| | |-- EffDate: string (nullable = true)
| | |-- ExpDate: string (nullable = true)
| | |-- PostalCode: string (nullable = true)
| | |-- Province: string (nullable = true)
| | |-- Street1: string (nullable = true)
| | |-- Street2: string (nullable = true)
|-- EmployeeEmail: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- CrewEmailAddr: string (nullable = true)
| | |-- EmailType: string (nullable = true)
|-- EmployeeEmerContact: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Addr: string (nullable = true)
| | |-- FirstName: string (nullable = true)
| | |-- LastName: string (nullable = true)
| | |-- PrimaryPhone: string (nullable = true)
| | |-- Relatnshp: string (nullable = true)
| | |-- Title: string (nullable = true)
|-- EmployeeEmplymntStatus: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- EmplymntStatusCode: string (nullable = true)
| | |-- EmplymntStatusReason: string (nullable = true)
| | |-- EndDate: string (nullable = true)
| | |-- StartDate: string (nullable = true)
|-- EmployeePhone: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- EmployeePhoneNumber: string (nullable = true)
| | |-- EffDate: string (nullable = true)
| | |-- ExpDate: string (nullable = true)
| | |-- PhoneType: string (nullable = true)
答案 0 :(得分:0)
您可以应用此处描述的解决方案:
How to rename fields in an DataFrame corresponding to nested JSON
执行以下操作,替换DataFrame架构(用新架构重新创建DataFrame。