PySpark-更改嵌套数组元素上的数据类型

时间:2019-03-12 14:47:09

标签: python pyspark apache-spark-sql

如何在嵌套数组(transaction_date)的元素上将字符串类型更改为datetime类型?这是我拥有的Spark数据框:

root
 |-- id
 |-- data: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- transaction: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- timestamp: string (nullable = true)
 |    |    |    |    |-- transaction_date: string (nullable = true)

我尝试使用此代码,但返回错误:

df = df.withColumn("transaction_date", df.data.transaction.transaction_date.cast(TimestampType()))

0 个答案:

没有答案