删除保留最近记录的重复记录

时间:2019-08-13 13:32:53

标签: sql apache-spark-sql hiveql sparklyr

我正在尝试删除重复的记录,似乎我的代码几天前就可以工作了,但是开始失败了。

这是我尝试过的一些事情

sdf_sql(spark,'DELETE pred FROM TB1 pred 
INNER JOIN TB2 pred2
WHERE pred.last_upd < pred2.last_upd AND pred.id = pred2.id')

这是我收到的消息错误:

Error: org.apache.spark.sql.catalyst.parser.ParseException: 
extraneous input 'pred' expecting 'FROM'(line 1, pos 7)

== SQL ==
DELETE pred FROM TB1 pred 
-------^^^
INNER JOIN TB2 pred2
WHERE pred.last_upd < pred2.last_upd AND pred.id = pred2.id

1 个答案:

答案 0 :(得分:1)

尝试以下代码:

DELETE pred FROM contacts pred
        INNER JOIN
    contacts t2 
WHERE
    pred.id > t2.id AND pred.email = t2.email;

http://www.mysqltutorial.org/mysql-delete-duplicate-rows/