我有2个模型 - Question
和Tag
- 它们之间有HABTM,它们共享一个联接表questions_tags
。
大饱眼福这个坏孩子:
1.9.3p392 :011 > Question.count
(852.1ms) SELECT COUNT(*) FROM "questions"
=> 417
1.9.3p392 :012 > Tag.count
(197.8ms) SELECT COUNT(*) FROM "tags"
=> 601
1.9.3p392 :013 > Question.connection.execute("select count(*) from questions_tags").first["count"].to_i
(648978.7ms) select count(*) from questions_tags
=> 39919778
我假设questions_tags
连接表包含一堆重复记录 - 否则,我不知道它为什么会这么大。
如何清理该联接表,使其只有uniq
个内容?或者我如何检查那里是否有重复的记录?
修改1
我正在使用PostgreSQL,这是join_table questions_tags
create_table "questions_tags", :id => false, :force => true do |t|
t.integer "question_id"
t.integer "tag_id"
end
add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id"
add_index "questions_tags", ["tag_id"], :name => "index_questions_tags_on_tag_id"
答案 0 :(得分:2)
我将此添加为新答案,因为它与我的上一次有很大不同。这个假设您没有在连接表上有id
列。这将创建一个新表,选择唯一的行,然后删除旧表并重命名新表。这将比涉及子选择的任何事情快得多。
foo=# select * from questions_tags;
question_id | tag_id
-------------+--------
1 | 2
2 | 1
2 | 2
1 | 1
1 | 1
(5 rows)
foo=# select distinct question_id, tag_id into questions_tags_tmp from questions_tags;
SELECT 4
foo=# select * from questions_tags_tmp;
question_id | tag_id
-------------+--------
2 | 2
1 | 2
2 | 1
1 | 1
(4 rows)
foo=# drop table questions_tags;
DROP TABLE
foo=# alter table questions_tags_tmp rename to questions_tags;
ALTER TABLE
foo=# select * from questions_tags;
question_id | tag_id
-------------+--------
2 | 2
1 | 2
2 | 1
1 | 1
(4 rows)
答案 1 :(得分:1)
删除带有错误标记引用的标记关联
DELETE FROM questions_tags
WHERE NOT EXISTS ( SELECT 1
FROM tags
WHERE tags.id = questions_tags.tag_id);
删除带有错误问题参考的标记关联
DELETE FROM questions_tags
WHERE NOT EXISTS ( SELECT 1
FROM questions
WHERE questions.id = questions_tags.question_id);
删除重复的标记关联
DELETE FROM questions_tags
USING ( SELECT qt3.user_id, qt3.question_id, MIN(qt3.id) id
FROM questions_tags qt3
GROUP BY qt3.user_id, qt3.question_id
) qt2
WHERE questions_tags.user_id=qt2.user_id AND
questions_tags.question_id=qt2.question_id AND
questions_tags.id != qt2.id
注意:强>
请先在开发环境中测试SQL,然后再在生产环境中进行测试。