Question

这恰好是一个rails应用程序，但我已经包含了生成的SQL，因此应该很容易看到我得到的内容。

我有父母 - ＆gt;＆gt; postgres数据库中的子关系。大约有500万儿童。我可以在短时间内（3s）获得父母孩子的所有ID（主键）。我可以在短时间内（32毫秒）通过id获取一个孩子。

为什么要获取父母的第一个孩子需要太长时间（> 5米）？什么是postgres做的事情，它只能在几秒钟内完成对我的感觉？

有多少孩子：

[3] pry(main)> Child.count
   (8832.5ms)  SELECT COUNT(*) FROM "children"
=> 5040608

父母有多少孩子：

[11] pry(main)> parent.children.count
   (76.0ms)  SELECT COUNT(*) FROM "children" WHERE "children"."parent_id" = $1  [["parent_id", 98107]]
=> 5213

获取父母孩子的所有ID：

[5] pry(main)> ids = parent.children.ids; nil
   (3184.6ms)  SELECT "children"."id" FROM "children" WHERE "children"."parent_id" = $1  [["parent_id", 98107]]

使用上面列表中的id来加载父级的第一个子级：

[6] pry(main)> i = Child.find ids.first
  Child Load (31.7ms)  SELECT  "children".* FROM "children" WHERE "children"."id" = $1 LIMIT $2  [["id", 7368558], ["LIMIT", 1]]

并解释：

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Limit  (cost=0.44..8.46 rows=1 width=1227)
   ->  Index Scan using children_pkey on children  (cost=0.44..8.46 rows=1 width=1227)
         Index Cond: (id = 7368558)
(3 rows)

尝试跳过一步 - 只需加载父级的第一个子级。 5米后超时。为什么？

[12] pry(main)> parent.children.first
  Child Load (308569.6ms)  SELECT  "children".* FROM "children" WHERE "children"."parent_id" = $1 ORDER BY "children"."id" ASC LIMIT $2  [["parent_id", 98107], ["LIMIT", 1]]

我担心我不知道如何在这个超时查询中获得查询计划。我会戳它......

子表的模式定义：

create_table "children", id: :serial, force: :cascade do |t|
  t.integer "parent_id"
  t.integer "some_table_id"
  t.integer "another_table_id"
  t.date "start_date"
  t.date "end_date"
  t.integer "number_field", default: 0, null: false
  t.string "another_number_field", default: "USD", null: false
  t.integer "probably_unused", default: 0, null: false
  t.string "probably_unused_localized", default: "USD", null: false
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
  t.bigint "sum", default: 0, null: false
  t.integer "some_count", default: 0, null: false
  t.integer "session_count", default: 0, null: false
  t.integer "duration", default: 0, null: false
  t.string "some_cycle", null: false
  t.hstore "bad_hash", default: {}, null: false
  t.index ["parent_id", "some_table_id", "some_cycle"], name: "unique_to_parent_sim_and_cycle", unique: true
  t.index ["parent_id"], name: "index_children_on_parent_id"
  t.index ["another_table_id"], name: "index_children_on_another_table_id"
  t.index ["some_table_id"], name: "index_children_on_some_table_id"
  t.index ["start_date"], name: "index_children_on_start_date"
end

Answer 1

好的，这里有几件事：

在第一个查询中，您没有使用ORDER BY：

SELECT  "children".* FROM "children" 
WHERE "children"."id" = $1 
LIMIT $2  [["id", 7368558], ["LIMIT", 1]]

但在慢速查询中你是：

SELECT  "children".* FROM "children" 
WHERE "children"."parent_id" = $1 
ORDER BY "children"."id" ASC 
LIMIT $2  [["parent_id", 98107], ["LIMIT", 1]]

当您添加ORDER BY时，首先必须对您的孩子进行排序，然后才能从中挑选。如果省略它会加快，特别是因为看起来没有索引children.id。

另外，请注意，在第一个查询中，您WHERE为"children"."id"，而第二个问题"children"."parent_id"，我认为这是一个拼写错误？根据您对这两列的索引（或不是），您还会看到速度差异。

为什么从postgres表中加载一行需要这么长时间？

1 个答案: