如何删除postgresql / Rails中的重复项加入结果集

时间:2014-03-03 11:25:40

标签: sql ruby-on-rails ruby postgresql ruby-on-rails-4

我有以下查询:

SELECT "users".* 
  FROM "users" 
       INNER JOIN "users_roles" 
                  ON "users_roles"."user_id" = "users"."id" 
       INNER JOIN "roles" 
                  ON "roles"."id" = "users_roles"."role_id" 
       LEFT JOIN events_users 
                  ON events_users.user_id = users.id 
       LEFT JOIN events 
                  ON events.id = events_users.event_id 
       LEFT JOIN booths 
                  ON booths.user_id = users.id 
GROUP BY 
       users.id, 
       roles.id, 
       events.id, 
       booths.id
ORDER BY 
       id ASC

我希望能够删除重复项,但如果它出现不止一次,它似乎会生成多个重复的角色,展位和事件。

下面是在SQL中生成以上查询的Rails活动记录命令:

users = User.
      joins(:roles).
      joins("LEFT JOIN events_users ON events_users.user_id = users.id LEFT JOIN events ON events.id = events_users.event_id").
      joins("LEFT JOIN booths ON booths.user_id = users.id").
      group("users.id, roles.id, events.id").
      order("#{sort_column} #{sort_direction}")

我也尝试过以下没有运气:

users = User.
      joins(:roles).
      joins("LEFT JOIN events_users ON events_users.user_id = users.id LEFT JOIN events ON events.id = events_users.event_id").
      joins("LEFT JOIN booths ON booths.user_id = users.id").
      group("users.id, roles.id, events.id").
      order("#{sort_column} #{sort_direction}")
      select("distinct on(users.id, roles.id, events.id, booths.id) users.*")

有没有办法删除结果集中的所有重复项?

4 个答案:

答案 0 :(得分:2)

尝试在select语句中使用DISTINCT子句。如果你将这种工作留给SQL,那几乎总是更好。

SELECT DISTINCT "users".* 
FROM "users" 
INNER JOIN "users_roles" 
  ON "users_roles"."user_id" = "users"."id" 
INNER JOIN "roles" 
  ON "roles"."id" = "users_roles"."role_id" 
LEFT JOIN events_users 
  ON events_users.user_id = users.id 
LEFT JOIN events 
  ON events.id = events_users.event_id 
LEFT JOIN booths 
  ON booths.user_id = users.id 
GROUP BY users.id, 
         roles.id, 
         events.id, 
         booths.id
ORDER BY id ASC

答案 1 :(得分:0)

postgresql中尝试此查询以删除所有重复的行:

delete from table1 where ctid not in
(select max(t1.id) from
(select ctid id,* from table1)t1
group by t1.name,t1.family);

要删除重复的行,您需要行postgresql中的唯一值,我们ctid作为表中每行的唯一值,我们可以使用ctid删除所有重复的行

SELECT DISTINCT * FROM
(SELECT "users".* 
FROM "users" 
INNER JOIN "users_roles" ON "users_roles"."user_id" = "users"."id" 
INNER JOIN "roles" ON "roles"."id" = "users_roles"."role_id" 
LEFT JOIN events_users ON events_users.user_id = users.id 
LEFT JOIN events ON events.id = events_users.event_id 
LEFT JOIN booths ON booths.user_id = users.id 
GROUP BY users.id, roles.id, events.id, booths.id
ORDER BY id asc)t1;

SQL Fiddle

答案 2 :(得分:0)

我真的不知道SQL解决方案,但我认为纯Ruby解决方案是使用uniq的{​​{1}}方法

这是文档:http://www.ruby-doc.org/core-2.1.1/Array.html#method-i-uniq

此方法允许您删除数组的所有重复项。 例如:

Array

希望它会对你有所帮助!

答案 3 :(得分:0)

我是ruby的新手,对操纵数据库不太满意,我更喜欢纯粹的ruby解决方案。我有一个Assignment加入表与:listing_id:school_id,我的代码导致了数十万个重复条目,因此school.listings导致了许多重复列表。首先,我使用Assignment.find_or_create_by而不是Assignment.create修复了代码问题,然后我使用下面的rake任务删除了重复的条目。删除重复项需要30分钟,所以肯定有更好的方法可以做到这一点,但我很满意这个结果,因为它有效。

desc "remove duplicate relationships in Assignment"
task :clean_assignment => :environment do

  listing_ids = Assignment.pluck(:listing_id)
  listing_ids = listing_ids.uniq

  listing_ids.each do |listing_id|
    count = 0
    assignments = Assignment.where(:listing_id => listing_id)
    school_ids = []
    assignments.each do |assign|
      if school_ids.include?(assign.school_id)
        assign.destroy
        count += 1
      else
        school_ids << assign.school_id
      end
    end
    if count > 0
     p "#{count} duplicates deleted from #{listing_id}"
    end
  end
end

rake任务完成后,我检查了没有重复项:

a = Assignment.pluck(:listing_id, :school_id)
b = a.uniq

irb(main):023:0> a.count
=> 191350
irb(main):024:0> b.count
=> 191350

删除重复项。