复杂的SQL查询,大数据库

时间:2016-08-02 14:02:24

标签: sql sql-server

我有以下数据库表:

图书

book_id | date | library_id 
1       |  06  | 34
2       |  02  | 12
3       |  04  | 34
4       |  09  | 66

LIBRARY

library_id | adress | owner
1          |  "cxc" | "andf"
2          |  "mkm" | "kla"
3          |  "ass" | "pol"
4          |  "kon" | "ger"

PAGESLLV

page_id | book_id | text
4       |   4     | "YYYY ss"
3       |   1     | "FFF as"
3       |   1     | "FDER fs"
3       |   2     | "GRG xx""

PAGESKYK

page_id | book_id | text
1       |   1     | "ddadad"
2       |   3     | "xcvxcv"
1       |   3     | "adad"
2       |   2     | "ddddweg"

PAGESLOO

page_id | book_id | text
6       |   5     | "VV"
5       |   2     | "CCC"
6       |   2     | "ZZ"
7       |   3     | "AA"

以及有关db的一些信息:

1)每本书都有很多页面

 example:

 Book with id 622 has:
 234 pages with id 45,
 120 pages with id 23,
 1 page with id 11,
 1 page with id 31,

 Book with id 322 has:
 1 page with id 67,
 1 page with id 88

2)每本书都有一个library_id

有9个名为PAGE___的表(其中___就像" LLV") 他们中的任何一个都有大约2400万条记录。

现在,我需要创建查询以提取包含具有给定ID的所有页面的所有书籍(包括库地址)。

所以,例如:

Book with id 622 has:
 234 pages with id 45,
 120 pages with id 88,
 1 page with id 11,
 1 page with id 23,

Book with id 13 has:
 234 pages with id 88,
 120 pages with id 23,
 1 page with id 11,
 1 page with id 15,
 2 pages with id 56,

 Book with id 322 has:
 1 page with id 23,
 1 page with id 88

他们给我阵列[88,23,11,15]并且我返回

book_id | date | library_adress | library_owner | 
13      | ~~~~ | ~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~|

因为只有ID为13的书才有效。

我正在使用Microsoft SQL Server 2008

我的sql现在:

'with p1 as (
    select distinct podv.Book_id, podv.Page_Id
    from PAGESLLV podv with (nolock)
    where podv.Page_Id in (' + @Ids + ')
    union all
    select distinct psv.Book_id, psv.Page_Id
    from PAGESXXN psv with (nolock)
    where psv.Page_Id in (' + @Ids + ')
    union all
    select distinct psav.Book_id, psav.Page_Id
    from PAGESTTY psav with (nolock)
    where psav.Page_Id in (' + @Ids + ')
    union all
    select distinct psx.Book_id, psx.Page_Id
    from PASGESPOO psx with (nolock)
    where psx.Page_Id in (' + @Ids + ')
    union all
    select distinct pv.Book_id, pv.Page_Id
    from PAGESMIO pv with (nolock)
    where pv.Page_Id in (' + @Ids + ')
    union all
    select distinct tpb.Book_id, tpb.Page_Id
    from PAGESQWW tpb with (nolock)
    where tpb.Page_Id in (' + @Ids + ')),
p2 as ( select p1.Book_id
    from p1
    group by p1.Book_id
    having COUNT(p1.Book_id) = ' + @Amount + ')
select top 1000
    r.Book_id,
    r.date,
    v.adress,
    v.owner,
    from  Books r with (nolock)
    inner join p2 with (nolock) on (r.Book_id = p2.Book_id)
    join Library v with (nolock) on (r.library_id = v.library_id)
    order by r.Book_id')

它有效但速度太慢

感谢您的帮助,对不起我的英语技能。

2 个答案:

答案 0 :(得分:0)

可能使它变慢的原因是,某些PAGE表的page_id没有索引。

因此,对这些PAGE表进行全表扫描会使其变慢。

可能甚至比page_id上​​的索引更好的是book_id和page_id上​​的组合索引。

因此下面的SQL并不重要 只是一个愚蠢的重写,将提供相同的性能。

IF OBJECT_ID('tempdb..#tmpPageIds') IS NOT NULL DROP TABLE #tmpPageIds;

CREATE TABLE #tmpPageIds (id int primary key);
insert into #tmpPageIds values (88),(23),(11),(15),(56);

DECLARE @Amount INT = (select count(*) from #tmpPageIds);

select 
 b.book_id, 
 b.date, 
 l.adress as library_adress, 
 l.owner as library_owner
from (
  select book_id
  (
    select distinct book_id, page_id from PAGESLLV t with (nolock)
    join #tmpPageIds tmp on (t.page_id = tmp.id)
    union all
    select distinct book_id, page_id from PAGESXXN t with (nolock)
    join #tmpPageIds tmp on (t.page_id = tmp.id)
    union all
    select distinct book_id, page_id from PAGESTTY t with (nolock)
    join #tmpPageIds tmp on (t.page_id = tmp.id)
    union all
    select distinct book_id, page_id from PASGESPOO t with (nolock)
    join #tmpPageIds tmp on (t.page_id = tmp.id)
    union all
    select distinct book_id, page_id from PAGESMIO t with (nolock)
    join #tmpPageIds tmp on (t.page_id = tmp.id)
    union all
    select distinct book_id, page_id from PAGESQWW t with (nolock)
    join #tmpPageIds tmp on (t.page_id = tmp.id)
  ) q1
  group by book_id
  having count(distinct page_id) = @Amount
) q2
join BOOKS b on (q2.book_id = b.book_id)
join LIBRARY l on (b.library_id = l.library_id);

答案 1 :(得分:0)

你一直在玩,看看这会和平行动有多好。所以我认为最好的方法如下,

With CTE(RealID,ID,Name) as (
select * from Table_1a
Union all
select * from Table_1b
)

select * from CTE where RealID in (1,43)

如果您的实例中的CTE是您的所有页面表,如果您确保BookId是每个表中的索引(可能最好将其聚类),您应该会看到相当不错的执行计划。我只测试了每张桌子大约140k的记录,但它确实对我有效。