如何在pyspark中将StringType列与ArrayType列的每个元素连接

时间:2019-12-16 00:30:05

标签: python apache-spark pyspark

我在pyspark数据框中有一个StringType()列和一个ArrayType(StringType())列。我想将ArrayType(StringType())列的每个元素与StringType()列连接。
示例:

+-----+---------------------+------------------------------+
|col1 |col2                 |col3                          |
+-----+---------------------+------------------------------+
|'AQQ'|['ABC', 'DEF']       |['AQQABC', 'AQQDEF']          |
|'APP'|['ABC', 'DEF', 'GHI']|['APPABC', 'APPDEF', 'APPGHI']|
+-----+---------------------+------------------------------+

谢谢:)

1 个答案:

答案 0 :(得分:1)

对于Spark 2.4+,请使用transform

from pyspark.sql.functions import expr 

df = spark.createDataFrame([('AQQ', ['ABC', 'DEF']),('APP', ['ABC', 'DEF', 'GHI'])], ['col1', 'col2'])

df.withColumn('col3', expr("transform(col2, x -> concat(col1, x))")).show(truncate=False)                            
+----+---------------+------------------------+
|col1|col2           |col3                    |
+----+---------------+------------------------+
|AQQ |[ABC, DEF]     |[AQQABC, AQQDEF]        |
|APP |[ABC, DEF, GHI]|[APPABC, APPDEF, APPGHI]|
+----+---------------+------------------------+
相关问题