有没有更好的方法在pyspark中将Array <int>转换为Array <string>

时间:2018-01-05 03:49:04

标签: apache-spark pyspark apache-spark-sql spark-dataframe

带有架构的非常大的 DataFrame:

root
 |-- id: string (nullable = true)
 |-- ext: array (nullable = true)
 |    |-- element: integer (containsNull = true)

到目前为止,我尝试explode数据,然后collect_list

select
  id,
  collect_list(cast(item as string))
from default.dual
lateral view explode(ext) t as item
group by
  id

但这种方式过于庞大。

1 个答案:

答案 0 :(得分:7)

您只需将ext列转换为字符串数组

即可
df = source.withColumn("ext", source.ext.cast("array<string>"))
df.printSchema()
df.show()