我有一个带有架构的df:
root
|-- AddressBook: struct (nullable = true)
| |-- ContactInformationsList: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- ContactId: string (nullable = true)
| | | |-- ContactMeansDesc: string (nullable = true)
| | | |-- IsPrimaryMeans: boolean (nullable = true)
| | | |-- TypeMeansContactId: string (nullable = true)
| | | |-- Value: string (nullable = true)
| |-- PersonData: struct (nullable = true)
| | |-- BirthDate: string (nullable = true)
| | |-- CSP: string (nullable = true)
| | |-- Civility: string (nullable = true)
| | |-- FirstName: string (nullable = true)
| | |-- Gender: string (nullable = true)
| | |-- LastName: string (nullable = true)
| | |-- MaritalStatus: string (nullable = true)
| | |-- SBirthDate: string (nullable = true)
| | |-- Title: string (nullable = true)
|-- PublicId: string (nullable = true)
|-- Version: long (nullable = true)
此数据框会生成prod数据,因此我想更改一些个人信息。基本上,使用值的哈希值替换列AddressBook.Persondata.Lastname
。
我试过了:
df.withColumn(
'AddressBook.Persondata.Lastname',
F.hash(F.col('AddressBook.Persondata.Lastname'))
)
但它刚刚添加了另一栏:
|-- AddressBook.Persondata.Lastname: int (nullable = true)
有一种简单的方法可以修改我的数据吗?