TL; DR 我想将包含以下列表列表的列转换为普通列表列表(或者不希望以下列表首先包含包装数组)。我看过很多帖子, 我发现与此相关的一个(Flatten Group By in Pyspark)。但是,当列表仅包含相同类型的WrappedArrays时,会减少列表
Current
[[shoulder,-1,WrappedArray([shoulder work out,165000], [shoulder pain,165000])],
[shampoo,-1,WrappedArray([purple shampoo,135000])]]
Desired
[[shoulder,-1,[[shoulder work out,165000], [shoulder pain,165000]]],
[shampoo,-1,[[purple shampoo,135000]]]]
我正在获取当前格式,因为我正在调用具有返回Schema的UDF,如下所示:
schema_find_kw = ArrayType(
StructType(
[
StructField("reco", StringType()),
StructField("count", IntegerType()),
StructField("sq_data",ArrayType(
StructType(
[
StructField("search_query", StringType()),
StructField("search_vol", LongType())
]
)
)
)
]
)
)
UDF返回类似[[shoulder,-1,[[(shoulder work out,165000)],[(shoulder pain,165000)]]]的列表, python中的[洗发水,-1,[[((紫色洗发水,135000)]]]]