我想通过从另一列
创建搜索字符串来替换列中存在的值前
id address st
后
1 2.PA1234.la 1234
2 10.PA125.la 125
3 2.PA156.ln 156
id address st
我试过了
1 2.PA9999.la 1234
2 10.PA9999.la 125
3 2.PA9999.ln 156
df.withColumn("address", regexp_replace("address","PA"+st,"PA9999"))
df.withColumn("address",regexp_replace("address","PA"+df.st,"PA9999")
两个接缝都失败了
TypeError: 'Column' object is not callable
答案 0 :(得分:0)
您也可以使用spark udf。
只要您需要用另一列中的值修改数据框条目,就可以应用该解决方案:
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
pd_input = pd.DataFrame({'address': ['2.PA1234.la','10.PA125.la','2.PA156.ln'],
'st':['1234','125','156']})
spark_df = sparkSession.createDataFrame(pd_input)
replace_udf = udf(lambda address, st: address.replace(st,'9999'), StringType())
spark_df.withColumn('adress_new',replace_udf(col('address'),col('st'))).show()
输出:
+-----------+----+------------+
| adress| st| adress_new|
+-----------+----+------------+
|2.PA1234.la|1234| 2.PA9999.la|
|10.PA125.la| 125|10.PA9999.la|
| 2.PA156.ln| 156| 2.PA9999.ln|
+-----------+----+------------+