为什么更改我的表架构会减慢查询速度?

时间:2014-05-14 14:35:31

标签: sql database performance sqlite

今天我对表进行了一些更改,试图让某些类型的查询运行得更快。这是表(在我更改之前):

CREATE TABLE IF NOT EXISTS street_addresses (
  id INTEGER PRIMARY KEY NOT NULL,
  house_number INTEGER NOT NULL,
  entrance TEXT NOT NULL,
  latitude REAL NOT NULL,
  longitude REAL NOT NULL,
  street_name INTEGER NOT NULL REFERENCES street_names(id),
  postal_code INTEGER NOT NULL REFERENCES postal_codes(id),
  city INTEGER NOT NULL REFERENCES cities(id),
  municipality INTEGER NOT NULL REFERENCES municipalities(id),
  CONSTRAINT unique_address UNIQUE(
    street_name, house_number, entrance, postal_code, city
  )
)

此表有两个索引(我可以识别):主键和5列的唯一键。我经常需要仅使用门牌号邮政编码列或门牌号 city <查询街道地址/ em> columns,所以我将表创建SQL更改为:

CREATE TABLE IF NOT EXISTS street_addresses (
  id INTEGER PRIMARY KEY NOT NULL,
  house_number INTEGER NOT NULL,
  entrance TEXT NOT NULL,
  latitude REAL NOT NULL,
  longitude REAL NOT NULL,
  street_name INTEGER NOT NULL REFERENCES street_names,
  postal_code INTEGER NOT NULL REFERENCES postal_codes,
  city INTEGER NOT NULL REFERENCES cities,
  municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
  ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
  ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
  ON street_addresses (
    street_name, house_number, entrance, postal_code, city
  );

我添加了两个索引,并将UNIQUE索引从表定义中移出(这样我就可以将所有密钥放在一个位置。)此外,我从(id)行中删除了REFERENCES,因为根据文档,它默认使用主键。我的数据库现在要大得多,但至少使用门牌号和邮政编码获取地址要快几十倍!

很遗憾,按街道名称和门牌号码搜索的查询(这是我数据库最常见的查询类型)似乎不再使用我的索引。在表格更改之前我使用街道名称和门牌号每秒读取约1700次,现在我得到~50。如果我使用所有5列进行搜索,我仍然可以获得良好的旧速度,但是现在只使用UNIQUE键中的前2列非常慢。

此外,使用门牌号和城市的查询仍然几乎与以前一样慢,比使用门牌号和邮政编码搜索要慢得多。

知道这是怎么回事吗?我是否需要为街道名称和门牌号定义新索引,即使这些列是UNIQUE键的一部分?如果是这样,为什么我的查询之前如此之快?此外,为什么房屋号码和城市查询与房屋号码和邮政编码查询的速度相同?

对不起文字墙。我希望有人可以提供帮助。以下是我使用的选择查询:


我的基准:

在换桌之前:

$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number
[ ============================ 100% (10000/10000) ============================ ]
5.9129 seconds
0.0006 seconds per interval
1691 intervals per second

$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city
[ ============================ 100% (10000/10000) ============================ ]
3.2198 seconds
0.0003 seconds per interval
3106 intervals per second

$ bin/benchmark_norway_database --search-by-components 100 --house_number --postal_code
[ ============================== 100% (100/100) ============================== ]
9.957 seconds
0.0996 seconds per interval
10 intervals per second

$ bin/benchmark_norway_database --search-by-components 100 --house_number --city
[ ============================== 100% (100/100) ============================== ]
10.2446 seconds
0.1024 seconds per interval
10 intervals per second

更改表后:

# This is now so dreadfully slow I can't do 10000 intervals.
$ bin/benchmark_norway_database --search-by-components 500 --street_name --house_number
[ ============================== 100% (500/500) ============================== ]
9.5749 seconds
0.0191 seconds per interval
52 intervals per second

# Still fast!
$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city
[ ============================ 100% (10000/10000) ============================ ]
3.4125 seconds
0.0003 seconds per interval
2930 intervals per second

# Much, much faster than before!
$ bin/benchmark_norway_database --search-by-components 10000 --house_number --postal_code
[ ============================ 100% (10000/10000) ============================ ]
22.2646 seconds
0.0022 seconds per interval
449 intervals per second

# Still slow? Why? :S
$ bin/benchmark_norway_database --search-by-components 500 --house_number --city
[ ============================== 100% (500/500) ============================== ]
14.3483 seconds
0.0287 seconds per interval
35 intervals per second

我的选择查询:

SELECT
  sn.name, sa.house_number, sa.entrance, pc.postal_code,
  ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
  street_addresses AS sa
  INNER JOIN street_names   AS sn ON sa.street_name  = sn.id
  INNER JOIN postal_codes   AS pc ON sa.postal_code  = pc.id
  INNER JOIN cities         AS ci ON sa.city         = ci.id
  INNER JOIN municipalities AS mu ON sa.municipality = mu.id
  INNER JOIN counties       AS co ON mu.county       = co.id
WHERE
  ...
ORDER BY
  ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
  0, 100

注意:在WHERE部分,我在搜索街道名称时使用GLOB,例如:

WHERE
  sn.name GLOB "FORNEBUVEIEN" AND
  sa.house_number = 11

我的所有表模式,假设它们是相关的:

CREATE TABLE IF NOT EXISTS counties (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT UNIQUE NOT NULL
)

CREATE TABLE IF NOT EXISTS municipalities (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL,
  number INTEGER NOT NULL,
  county INTEGER NOT NULL REFERENCES counties,
  CONSTRAINT unique_municipality UNIQUE(name, county)
);
CREATE UNIQUE INDEX IF NOT EXISTS mu_number
  ON municipalities (number);
CREATE UNIQUE INDEX IF NOT EXISTS mu_unique_name_co
  ON municipalities (name, county);

CREATE TABLE IF NOT EXISTS cities (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL,
  municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE UNIQUE INDEX IF NOT EXISTS ci_unique_name_mu
  ON cities (name, municipality);

CREATE TABLE IF NOT EXISTS postal_codes (
  id INTEGER PRIMARY KEY NOT NULL,
  postal_code INTEGER NOT NULL,
  city INTEGER NOT NULL REFERENCES cities
);
CREATE UNIQUE INDEX IF NOT EXISTS po_postal_code
  ON postal_codes (postal_code);

CREATE TABLE IF NOT EXISTS street_names (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS sn_name
  ON street_names (name);

CREATE TABLE IF NOT EXISTS street_addresses (
  id INTEGER PRIMARY KEY NOT NULL,
  house_number INTEGER NOT NULL,
  entrance TEXT NOT NULL,
  latitude REAL NOT NULL,
  longitude REAL NOT NULL,
  street_name INTEGER NOT NULL REFERENCES street_names,
  postal_code INTEGER NOT NULL REFERENCES postal_codes,
  city INTEGER NOT NULL REFERENCES cities,
  municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
  ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
  ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
  ON street_addresses (
    street_name, house_number, entrance, postal_code, city
  );

我在导入所有数据后运行这些命令:

PRAGMA journal_mode = OFF
PRAGMA page_size = 65536
VACUUM

使用街道名称和门牌号时的EXPLAIN QUERY PLAN:

sqlite> EXPLAIN QUERY PLAN SELECT sn.name, sa.house_number, sa.entrance, pc.postal_code, ci.name, mu.name, co.name, sa.latitude, sa.longitude FROM street_addresses AS sa INNER JOIN street_names   AS sn ON sa.street_name  = sn.id INNER JOIN postal_codes   AS pc ON sa.postal_code  = pc.id INNER JOIN cities         AS ci ON sa.city         = ci.id INNER JOIN municipalities AS mu ON sa.municipality = mu.id INNER JOIN counties       AS co ON mu.county       = co.id WHERE sn.name GLOB "FORNEBUVEIEN" AND sa.house_number=11 ORDER BY ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC LIMIT 0, 100;
selectid    order       from        detail                                                                   
----------  ----------  ----------  -------------------------------------------------------------------------
0           0           0           SEARCH TABLE street_addresses AS sa USING INDEX sa_hn_ci (house_number=?)
0           1           1           SEARCH TABLE street_names AS sn USING INTEGER PRIMARY KEY (rowid=?)      
0           2           2           SEARCH TABLE postal_codes AS pc USING INTEGER PRIMARY KEY (rowid=?)      
0           3           3           SEARCH TABLE cities AS ci USING INTEGER PRIMARY KEY (rowid=?)            
0           4           4           SEARCH TABLE municipalities AS mu USING INTEGER PRIMARY KEY (rowid=?)    
0           5           5           SEARCH TABLE counties AS co USING INTEGER PRIMARY KEY (rowid=?)          
0           0           0           USE TEMP B-TREE FOR ORDER BY

1 个答案:

答案 0 :(得分:0)

事实证明,在我的WHERE查询中使用SELECT这样的部分:

WHERE
  sn.name GLOB ? AND
  sa.house_number = ?

SQLite3选择索引sa_hn_ci(house_number,city)而不是sa_unique_address。这使得查询运行速度大约慢了100倍。

我现在每当我的查询包含街道名称时使用INDEXED BY解决此问题:

SELECT
  sn.name, sa.house_number, sa.entrance, pc.postal_code,
  ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
  street_addresses AS sa INDEXED BY sa_unique_address          -- This line!
  INNER JOIN street_names   AS sn ON sa.street_name  = sn.id
  INNER JOIN postal_codes   AS pc ON sa.postal_code  = pc.id
  INNER JOIN cities         AS ci ON sa.city         = ci.id
  INNER JOIN municipalities AS mu ON sa.municipality = mu.id
  INNER JOIN counties       AS co ON mu.county       = co.id
WHERE
  sn.name GLOB "FORNEBUVEIEN" AND
  sa.house_number=11
ORDER BY
  ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
  0, 100;

但我不知道为什么SQLite3选择了错误的索引。正在运行ANALYZE并没有改变任何内容。

我没有将此答案标记为正确。