在python中,我这样做是用91替换列电话中的前导0。 但是如何在pyspark中做到这一点。
con数据框为:
id phone1
1 088976854667
2 089706790002
我想要的Outptut
1 9188976854667
2 9189706790002
# Replace leading Zeros in a phone number with 91
con.filter(regex='[_]').replace('^0','385',regex=True)
答案 0 :(得分:2)
您正在寻找regexp_replace函数。该函数带有3个参数:
from pyspark.sql import functions as F
columns = ['id', 'phone1']
vals = [(1, '088976854667'),(2, '089706790002' )]
df = spark.createDataFrame(vals, columns)
df = df.withColumn('phone1', F.regexp_replace('phone1',"^0", "91"))
df.show()
输出:
+---+-------------+
| id| phone1|
+---+-------------+
| 1|9188976854667|
| 2|9189706790002|
+---+-------------+