用特定字符串替换列值

时间:2019-06-07 06:15:26

标签: python apache-spark pyspark

我有这个数据框:我想用df.rule1替换1,用df.rule2替换2

+---+---------+------+
|SNo|Operation|Points|
+---+---------+------+
|  1|    1 & 2|   100|
|  2|    1 | 2|   200|
|  3|1 | 2 & 3|   350|
+---+---------+------+

我希望将此数据框放入其中:

+---+------------------------------+------+
|SNo|Operation                     |Points|
+---+------------------------------+------+
|1  |df.rule1 & df.rule2           |100   |
|2  |df.rule1 | df.rule2           |200   |
|3  |df.rule1 | df.rule2 & df.rule3|350   |
+---+------------------------------+------+

2 个答案:

答案 0 :(得分:1)

pd.Series.replaceregex=True一起使用:

df['Operation'].replace('(\d)', 'df.rule\\1', regex=True)

输出:

0               df.rule1 & df.rule2
1               df.rule1 | df.rule2
2    df.rule1 | df.rule2 & df.rule3
Name: Operation, dtype: object

答案 1 :(得分:1)

假设它是pyspark DataFrame,我们可以使用regexp_replace

from pyspark.sql import functions as F

df.withColumn('Operation', F.regexp_replace('Operation', r'\d', r'df.rule\1'))