如何使用Rails删除MySQL中的重复项?

时间:2011-04-12 16:26:35

标签: mysql sql ruby-on-rails

我必须遵循以下表格:

关系

[id,user_id,status]
1,2,sent_reply
1,2,sent_mention
1,3,sent_mention
1,4,sent_reply
1,4,sent_mention

我正在寻找一种删除重复项的方法,以便只保留以下行:

1,2,sent_reply
1,3,sent_mention
1,4,sent_reply

(最好使用Rails)

2 个答案:

答案 0 :(得分:3)

我知道这已经太迟了,但是我找到了一个使用Rails 3的好方法。但是,可能有更好的方法,而且我不知道这将如何使用100,000多行数据,但这应该让你走上正确的轨道。

# Get a hash of all id/user_id pairs and how many records of each pair
counts = ModelName.group([:id, :user_id]).count
# => {[1, 2]=>2, [1, 3]=>1, [1, 4]=>2}

# Keep only those pairs that have more than one record
dupes = counts.select{|attrs, count| count > 1}
# => {[1, 2]=>2, [1, 4]=>2}

# Map objects by the attributes we have
object_groups = dupes.map do |attrs, count|
  ModelName.where(:id => attrs[0], :user_id => attrs[1])
end

# Take each group and #destroy the records you want.
# Or call #delete instead to save time if you don't need ActiveRecord callbacks
# Here I'm just keeping the first one I find.
object_groups.each do |group|
  group.each_with_index do |object, index|
    object.destroy unless index == 0
  end
end

答案 1 :(得分:-1)

最好通过SQL来实现。但是如果你更喜欢使用Rails:

(Relation.all - Relation.all.uniq_by{|r| [r.user_id, r.status]}).each{ |d| d.destroy }

 ids = Relation.all.uniq_by{|r| [r.user_id, r.status]}.map(&:id)
 Relation.where("id IS NOT IN (?)", ids).destroy_all # or delete_all, which is faster

但我不喜欢这个解决方案:D