Question

有很多这样类似的问题，但找不到合适的答案。

我的EntryVote模型包含字段user_id，entry_id和其他一些字段。

我想创建简单的rake任务来删除user_id，entry_id组的重复项（与组中剩下的记录无关）。这样做的最佳方式是什么？

例如：

id, user_id, entry_id
1,1,1
2,1,1
3,1,1
4,5,6
5,5,6
6,7,7

我明白了：

1,1,1
4,5,6
6,7,7

我知道如何选择user_id，entry_id进行重复数据删除，但不知道以后如何使用它：

EntryVote.select('user_id, entry_id').group('user_id,entry_id').having('count() > 1')

Answer 1

可能不是最佳解决方案，但请尝试以下

EntryVote.count(:id, group: [:user_id, :entry_id]).each do |(user_id, entry_id), count|
  if count > 1
    EntryVote.offset(1).where(user_id: user_id, entry_id: entry_id).delete_all
  end
end

或者您可以添加检查user_id和entry_id唯一性的验证并尝试保存记录。如果记录由于验证而未保存并失败，则只需删除记录即可。我很确定这比第一个选项慢一些：）

Answer 2

如果希望列entry_id和user_id是唯一的外键，则包含特殊SQL删除语句的以下rake任务将有所帮助

  task 'delete_duplicates' => :environment do
    puts "Removing duplicates in table entry_votes"
    puts "Entries before: #{n1=EntryVote.count}"
    sql = "delete e1 from entry_votes e1, entry_votes e2 "+
          "where (e1.user_id = e2.user_id) and (e1.entry_id = e2.entry_id) "+
          "and (e1.id > 12.id);")
    ActiveRecord::Base.connection.execute(sql);
    puts "Entries after: #{n2=EntryVote.count}, #{n1-n2} duplicates removed"
  end

另请参阅此SO question about duplicates或本文how to delete duplicates using SQL。

使用rake任务删除重复项

2 个答案: