专家,这是次要的,但我无法正确解决。
+--------------+----------------------------------------------------------+-------------------+
|table |query |date |
+--------------+----------------------------------------------------------+-------------------+
|AGENT |select * from table where DW_EFFECTIVE_DATE_PARTITION ='X'|2019-12-24 00:00:00|
+--------------+----------------------------------------------------------+-------------------+
我在此数据框中想要做的就是将列查询更改为-
select * from table where DW_EFFECTIVE_DATE_PARTITION ='2019-12-24 00:00:00'
我尝试过-
>>> dfX.withColumn('query',regexp_replace('query',"'X'","'" + dfX['d'] + "'")).show()
Traceback (most recent call last):
TypeError: 'Column' object is not callable
所需的输出-
+--------------+----------------------------------------------------------------------------+-------------------+
|table |query |date |
+--------------+----------------------------------------------------------------------------+-------------------+
|AGENT |select * from table where DW_EFFECTIVE_DATE_PARTITION ='2019-12-24 00:00:00'|2019-12-24 00:00:00|
+--------------+----------------------------------------------------------------------------+-------------------+
答案 0 :(得分:2)
您可以使用selectExpr
代替withColumn
:
>>> df.selectExpr("table","regexp_replace(query, 'X', date) as query", "date").show(truncate=False)
+-----+----------------------------------------------------------------------------+-------------------+
|table|query |date |
+-----+----------------------------------------------------------------------------+-------------------+
|AGENT|select * from table where DW_EFFECTIVE_DATE_PARTITION ='2019-12-24 00:00:00'|2019-12-24 00:00:00|
+-----+----------------------------------------------------------------------------+-------------------+
答案 1 :(得分:1)
将regexp_replace
与expr
一起使用,这样可以用另一个列值替换字符串:
replace_expr = """regexp_replace(query,"'X'",concat("'", date, "'"))"""
df.withColumn("query", expr(replace_expr)).show(truncate=False)
礼物:
+-----+----------------------------------------------------------------------------+-------------------+
|table|query |date |
+-----+----------------------------------------------------------------------------+-------------------+
|AGENT|select * from table where DW_EFFECTIVE_DATE_PARTITION ='2019-12-24 00:00:00'|2019-12-24 00:00:00|
+-----+----------------------------------------------------------------------------+-------------------+
答案 2 :(得分:0)
def replace_string(s):
if s == "A":
return "a"
else:
return "b"
replace_string_udf = spark.udf.register("replace_string", replace_string, StringType())
df = df.withColumn("new_column", replace_string_udf("old_column_name"))