以下是rails中的查询:
User.limit(20).
where.not(id: to_skip, number_of_photos: 0).
where(age: @user.seeking_age_min..@user.seeking_age_max).
tagged_with(@user.seeking_traits, on: :trait, any: true).
tagged_with(@user.seeking_gender, on: :trait, any: true).ids
这是EXPLAIN ANALYZE
的输出。请注意id <> ALL(...)
部分已缩短。里面有大约10K的ids。
Limit (cost=23.32..5331.16 rows=20 width=1698) (actual time=2237.871..2243.709 rows=20 loops=1)
-> Nested Loop Semi Join (cost=23.32..875817.48 rows=3300 width=1698) (actual time=2237.870..2243.701 rows=20 loops=1)
-> Merge Semi Join (cost=22.89..857813.95 rows=8311 width=1702) (actual time=463.757..2220.691 rows=1351 loops=1)
Merge Cond: (users.id = users_trait_taggings_356a192.taggable_id)
-> Index Scan using users_pkey on users (cost=0.29..834951.51 rows=37655 width=1698) (actual time=455.122..2199.322 rows=7866 loops=1)
Index Cond: (id IS NOT NULL)
Filter: ((number_of_photos <> 0) AND (age >= 18) AND (age <= 99) AND (id <> ALL ('{7066,7065,...,15624,23254}'::integer[])))
Rows Removed by Filter: 7652
-> Index Only Scan using taggings_idx on taggings users_trait_taggings_356a192 (cost=0.42..22767.59 rows=11393 width=4) (actual time=0.048..16.009 rows=4554 loops=1)
Index Cond: ((tag_id = 2) AND (taggable_type = 'User'::text) AND (context = 'trait'::text))
Heap Fetches: 4554
-> Index Scan using index_taggings_on_taggable_id_and_taggable_type_and_context on taggings users_trait_taggings_5df4b2a (cost=0.42..2.16 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1351)
Index Cond: ((taggable_id = users.id) AND ((taggable_type)::text = 'User'::text) AND ((context)::text = 'trait'::text))
Filter: (tag_id = ANY ('{4,6}'::integer[]))
Rows Removed by Filter: 2
Total runtime: 2243.913 ms
似乎Index Scan using users_pkey on users
的索引扫描花了很长时间才出现问题。即使age
,number_of_photos
和id
上有索引:
add_index "users", ["age"], name: "index_users_on_age", using: :btree
add_index "users", ["number_of_photos"], name: "index_users_on_number_of_photos", using: :btree
to_skip
是一个不跳过的用户ID数组。 user
有很多skips
。每个skip
都有partner_id
。
所以要抓取to_skip
我正在做的事情:
to_skip = @user.skips.pluck(:partner_id)
我试图将查询隔离到:
sql = User.limit(20).
where.not(id: to_skip, number_of_photos: 0).
where(age: @user.seeking_age_min..@user.seeking_age_max).to_sql
仍然在解释分析中遇到同样的问题。再次,用户ID列表被剪切:
Limit (cost=0.00..435.34 rows=20 width=1698) (actual time=0.219..4.844 rows=20 loops=1)
-> Seq Scan on users (cost=0.00..819629.38 rows=37655 width=1698) (actual time=0.217..4.838 rows=20 loops=1)
Filter: ((id IS NOT NULL) AND (number_of_photos <> 0) AND (age >= 18) AND (age <= 99) AND (id <> ALL ('{7066,7065,...,15624,23254}'::integer[])))
Rows Removed by Filter: 6
Total runtime: 5.044 ms
关于如何在rails + postgres中优化此查询的任何想法?
编辑:以下是相关模型:
class User < ActiveRecord::Base
acts_as_messageable required: :body, # default [:topic, :body]
dependent: :destroy
has_many :skips, :dependent => :destroy
acts_as_taggable # Alias for acts_as_taggable_on :tags
acts_as_taggable_on :seeking_gender, :trait, :seeking_race
scope :by_updated_date, -> {
order("updated_at DESC")
}
end
# schema
create_table "users", force: :cascade do |t|
t.string "email", default: "", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.text "skips", array: true
t.integer "number_of_photos", default: 0
t.integer "age"
end
add_index "users", ["age"], name: "index_users_on_age", using: :btree
add_index "users", ["email"], name: "index_users_on_email", unique: true, using: :btree
add_index "users", ["number_of_photos"], name: "index_users_on_number_of_photos", using: :btree
add_index "users", ["updated_at"], name: "index_users_on_updated_at", order: {"updated_at"=>:desc}, using: :btree
class Skip < ActiveRecord::Base
belongs_to :user
end
# schema
create_table "skips", force: :cascade do |t|
t.integer "user_id"
t.integer "partner_id"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
add_index "skips", ["partner_id"], name: "index_skips_on_partner_id", using: :btree
add_index "skips", ["user_id"], name: "index_skips_on_user_id", using: :btree
答案 0 :(得分:2)
速度问题可能是由于to_skip
(大约60Kb)中的id列表以数组形式传入。然后解决方案是将其重新编写为子查询的结果,以便更好地优化查询。
在构建to_skip
时,请尝试使用select
代替pluck
。 pluck
返回一个数组,然后传递给主查询。反过来,select
返回ActiveRecord::Relation
,其中的sql可以包含在主查询中,可能会提高效率。
to_skip = @user.skips.select(:partner_id)
在发布模型代码之前,很难提出更具体的建议。我探索的一般方向是尝试将所有相关步骤合并到一个查询中,让数据库进行优化。
<强>更新强>
使用select
的Active Record查询看起来像这样(我跳过taggable
内容,因为它似乎不会对性能造成太大影响):
User.limit(20).
where.not(id: @user.skips.select(:partner_id), number_of_photos: 0).
where(age: 0..25)
这是执行的SQL查询。请注意子查询如何获取要跳过的ID:
SELECT "users".* FROM "users"
WHERE ("users"."number_of_photos" != 0)
AND ("users"."id" NOT IN (
SELECT "skips"."partner_id"
FROM "skips"
WHERE "skips"."user_id" = 1
))
AND ("users"."age" BETWEEN 0 AND 25)
LIMIT 20
尝试以这种方式运行查询,看看它如何影响性能。