SQL indexing made some queries take longer. Why?

时间:2019-01-09 22:30:41

标签: java sqlite indexing

I have made a database which holds around 24 millions records on house sales in the UK. I have written a small java program which queries the database and displays the results in a table. The user searches a postcode or a partial postcode and all matches are displayed. I originally worked on an un-indexed table and full postcodes (e.g. lk4 5th) took about 5 seconds to perform. larger searches (e.g. lk4 5) took about 8 seconds and very large searches (l) took about 25 seconds. I was asked to index the database as that would increase the speed of queries. I remade the table with the following SQL code:

CREATE TABLE sales(
id TEXT,price INTEGER,sale_date TEXT,postcode TEXT,
prop_type CHAR,newbuild CHAR,leasetype CHAR,
paon TEXT,saon TEXT,street TEXT,locality TEXT,
town TEXT,district TEXT,county TEXT,category CHAR,status CHAR
);
.mode csv
.import C:/Users/(path goes here)
CREATE INDEX i_postcode ON sales(postcode collate nocase);

This has improved the speed of searches that return less results (e.g. lk4 5th - lk4) significantly however for the larger searches it has increased it to an unusable amount of time. 5 mins +.

The only query being performed is a very simple one which is:

SELECT price, sale_date, postcode, paon, street, locality FROM sales WHERE postcode LIKE ?;

I have used Javas built in VisualVM software to view the CPU samples and it seems that org.sqlite.core.NativeDB.step[native] is the area of concentration which takes so long to process. I am completely new to using databases and have been unable to find anything online that suggests that this should have increased the processing time. If you have any ideas of what I can do to increase to speed of the large searches, that would be very much appreciated.

I appreciate your time.

1 个答案:

答案 0 :(得分:0)

我认为问题很可能是邮政编码不是高基数,尤其是当固定部分(通配符之前)较短(较长的搜索/更多的结果)并且因此在较长的位置搜索二进制数时搜索变成0(n)线性扫描。

我从未尝试过,但是如果搜索参数的固定部分的长度小于3(因此为l%,lk%但不是lk4%),则使用 +邮政编码对效率更高的 rowid (即 +邮政编码说不要使用索引)进行线性扫描。

  • 基于lk4的长度为3的长度

以下显示了邮政编码已还原为 rowid 的邮政编码,其中 + 编码为:-

componentDidMount() {
  this.loadUser(this.props.match.id);
}

loadUser = (id) => {
  API.findUserById(id)..
  …
}

componentWillReceiveProps(newProps) {
  if(newProps.match.params.id !== this.props.match.id) {
     this.loadUser(newProps.match.params.id)
  }
}

结果1-没有索引

enter image description here

结果2-强制不使用索引(与结果1相同)

enter image description here

结果3-+未使用,因此已使用索引

enter image description here