如何在具有聚合的查询中使用索引?

时间:2015-04-08 19:26:18

标签: sql indexing group-by aggregate-functions

给出类似

的查询
SELECT franchise, MAX(worth)
FROM figurines
GROUP BY franchise

什么类型的索引会加快此查询速度,数据库将如何使用该索引?

如果需要更多详细信息,请假设列franchise的基数相对较低且worth具有非常高的基数。

我个人使用的是mysql,但我正在寻找对算法的一般理解,而不是供应商特定的实现细节。

1 个答案:

答案 0 :(得分:1)

场景1:没有索引(阅读整个表格)

foreach(page in table.pages)
{
  foreach(row in page.rows)
  {
    Compare and accumulate franchise and worth from row
  }
}
-- Total IO = table.pages

场景2:仅限特许经营指数

foreach(page in index.pages)
{
  foreach(indexRow in page.rows)
  {
    tableRow = table.fetchRow(indexRow); // + 1 page of IO for each row
    Compare and accumulate franchise from indexRow and worth from tableRow
  }
}
-- Total IO = index.pages + table.rows
-- this is likely to be greater than Scenario 1...
--  so optimizer should prefer that plan instead.

情景3:按顺序覆盖指数(特许经营,价值)。

foreach(page in index.pages)
{
  foreach(row in page.rows)
  {
    Compare and accumulate franchise and worth from row
  }
}
-- Total IO = index.pages
-- Assuming that index is thinner than table, a win!

场景4:具有来自场景3的索引的已知特许经营列表的不同查询

foreach(franchise in franchises)
{
  SELECT MAX(worth) FROM figurines WHERE franchise = franchise
}

...

foreach(franchise in franchises)
{
  search into the index looking for the last record with this franchise
  // this is usually less than 10 pages of IO in my experience.
}
-- Total IO = count of franchise * 10
-- super win!

场景4不同,因为它会发出搜查而不是扫描。