我正在尝试通过使用另一列来查找和替换列字符串中的值。
我有两列标签和选择。
Table 1
id = 12
labels = case1|case2|case3
table 2
id =12
label&values = case1coke.case2fanta:case3cheez
上面的示例是英语,但是label&values和labels列是日语。我尝试使用regex_replace
,但是由于数据量大和许多特殊情况下的字符regex_replace对我不起作用。我正在寻找一种可以通过字符串匹配解决我的问题的方法
预期输出为:
id label value
12 case1 coke
12 case2 fanta
12 case3 juice
df = sqlContext.sql("select \
a.shop_id \
,a.item_id \
,regexp_replace \
( \
regexp_replace \
( \
a.choice \
,concat('(^|(?<![::]))(', concatlables, ')') -- this is not working for all of the japanese records in this case \
,'⚙$2⚛' \
) \
,'⚛[::]' ,'⚛' \
) as choice \
from \
rdsp_production_production_ex_odin_mall.basket_main2 a \
inner join brandmart.control_labels_concatlabels b \
on a.shop_id = b.shop_id \
and a.item_id = b.item_id \
where a.reg_date > '2019-02-07'")
r = df.select("shop_id","item_id",f.split("choice", "⚙").alias("final"),f.posexplode(f.split("choice", "⚙")).alias("pos", "val"))
split_col = split(r['choice'], '⚛')
r = r.withColumn('NAME1', split_col.getItem(0))
r = r.withColumn('NAME2', split_col.getItem(1)
错误 错误[Stage 23:>(6 + 2)/ 2977] 19/02/14 11:45:58 WARN Scheduler.TaskSetManager:在阶段23.0中丢失了任务972.0(TID 54,bhdp4411.prod.hnd1.bdd.local,执行者2):org.apache.spark.SparkException:在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter .scala:204),位于org.apache.spark.sql.execution.datasources的org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1 $$ anonfun $ 3.apply(FileFormatWriter.scala:129) .FileFormatWriter $$ anonfun $ write $ 1 $$ anonfun $ 3.apply(org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)在org.apache.spark.scheduler.Task在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)在.run(Task.scala:99) util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java :624)at java.lang.Thread.run(Thread.java:748)原因:java.util.regex.PatternSyntaxException:靠近索引79(^ |(?