我正在尝试删除重复的记录,似乎我的代码几天前就可以工作了,但是开始失败了。
这是我尝试过的一些事情
sdf_sql(spark,'DELETE pred FROM TB1 pred
INNER JOIN TB2 pred2
WHERE pred.last_upd < pred2.last_upd AND pred.id = pred2.id')
这是我收到的消息错误:
Error: org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'pred' expecting 'FROM'(line 1, pos 7)
== SQL ==
DELETE pred FROM TB1 pred
-------^^^
INNER JOIN TB2 pred2
WHERE pred.last_upd < pred2.last_upd AND pred.id = pred2.id
答案 0 :(得分:1)
尝试以下代码:
DELETE pred FROM contacts pred
INNER JOIN
contacts t2
WHERE
pred.id > t2.id AND pred.email = t2.email;