我在PySpark中有两个数据框,分别是A,B,其结构如下所示。我想用数据框B中的查找值更新数据框A列中的值。PySpark中是否有内置函数来实现此目的,还是我必须递归运行pyspark.sql.functions.regexp_replace
?
DataFrame A Dataframe B Resultant
--------------- --------------- ---------------
| seqn | colB | | colX | colY | | colA | colB |
--------------- --------------- ---------------
| s1 | x,y,z| | p | c | | a | e,f,g |
| s2 | p,y,r| takes | r | d | ==> | a | c,f,g |
| s3 | y,z | value | x | e | | a | f,g |
| s4 | p,z | from | y | f | | a | c,g |
--------------- | z | g | ----------------
---------------