我必须遵循以下表格:
关系
[id,user_id,status]
1,2,sent_reply
1,2,sent_mention
1,3,sent_mention
1,4,sent_reply
1,4,sent_mention
我正在寻找一种删除重复项的方法,以便只保留以下行:
1,2,sent_reply
1,3,sent_mention
1,4,sent_reply
(最好使用Rails)
答案 0 :(得分:3)
我知道这已经太迟了,但是我找到了一个使用Rails 3的好方法。但是,可能有更好的方法,而且我不知道这将如何使用100,000多行数据,但这应该让你走上正确的轨道。
# Get a hash of all id/user_id pairs and how many records of each pair
counts = ModelName.group([:id, :user_id]).count
# => {[1, 2]=>2, [1, 3]=>1, [1, 4]=>2}
# Keep only those pairs that have more than one record
dupes = counts.select{|attrs, count| count > 1}
# => {[1, 2]=>2, [1, 4]=>2}
# Map objects by the attributes we have
object_groups = dupes.map do |attrs, count|
ModelName.where(:id => attrs[0], :user_id => attrs[1])
end
# Take each group and #destroy the records you want.
# Or call #delete instead to save time if you don't need ActiveRecord callbacks
# Here I'm just keeping the first one I find.
object_groups.each do |group|
group.each_with_index do |object, index|
object.destroy unless index == 0
end
end
答案 1 :(得分:-1)
最好通过SQL来实现。但是如果你更喜欢使用Rails:
(Relation.all - Relation.all.uniq_by{|r| [r.user_id, r.status]}).each{ |d| d.destroy }
或
ids = Relation.all.uniq_by{|r| [r.user_id, r.status]}.map(&:id)
Relation.where("id IS NOT IN (?)", ids).destroy_all # or delete_all, which is faster
但我不喜欢这个解决方案:D