我有以下表结构:
Tags:
Tag_ID | Name
1 | Tag1
2 | Tag2
3 | Tag3
4 | Tag4
5 | Tag5
6 | Tag6
Posts:
Post_ID | Title | Body
1 | Post1 | Post1
2 | Post2 | Post2
3 | Post3 | Post3
4 | Post4 | Post4
5 | Post5 | Post5
6 | Post6 | Post6
7 | Post7 | Post7
8 | Post8 | Post8
9 | Post9 | Post9
10 | Post10| Post10
TagsPosts:
Tag_ID | Post_ID
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5
1 | 10
1 | 1
2 | 1
2 | 2
2 | 6
2 | 7
3 | 4
3 | 8
3 | 9
4 | 7
5 | 1
5 | 2
5 | 3
5 | 4
5 | 5
5 | 6
5 | 7
6 | 2
我需要从查询中返回的是最常见的Posts
的前3 Tag
和Post
的其余部分的前1 Tags
而未提供任何重复Posts
。
Desired Output:
Tag_ID | Post_ID
5 | 1
5 | 2
5 | 3
1 | 10
2 | 6
3 | 9
4 | 7
到目前为止,我能够使用以下内容确定最常见Posts
的前3 Tag
:
SELECT Top(3) t.Tag_ID, p.Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID IN (
SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC)
Result:
Tag_ID | Post_ID
5 | 1
5 | 2
5 | 3
我还使用以下内容确定了Post
其余部分的前1名Tags
:
SELECT t.Tag_ID, p.Post_ID FROM Tags as t
INNER JOIN (
SELECT t.Tag_ID, Max(p.Post_ID) as Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID NOT IN (
SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC)
AND
p.Post_ID NOT IN (
SELECT Top(3) p.Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID IN (
SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC))
GROUP BY t.Tag_ID) as s ON t.Tag_ID = s.Tag_ID
INNER JOIN Posts as p ON s.Post_ID = p.Post_ID
Result:
Tag_ID | Post_ID
1 | 10
2 | 7
3 | 9
4 | 7
这几乎就在那里,但正如您所看到的,它会返回重复的Posts
。
顺便说一下,我使用SQL Server 2008 Express进行测试,因为我不熟悉MySQL,但我被要求确定可以应用于MySQL数据库的SQL查询。我想如果我在T-SQL中得到基本查询,那么转换成MySQL使用的任何SQL都会相当简单。
答案 0 :(得分:0)
我会使用窗口函数,将其存储在CTE中,然后在谓词中引用它。像这样(使用可以从SSMS运行的数据的简化版本)。您列出了SQL-Server但未列出版本。我相信表函数可以在2005版及更高版本的SQL Server上运行,但我不确定。
declare @Tag table ( tagid int identity, name varchar(8));
insert into @Tag values ('Tag1'),('Tag2'),('Tag3'),('Tag4'),('Tag5'),('Tag6');
declare @Posts table (postid int identity, tagid int, postbody varchar(32));
insert into @Posts values (1,'Blah'),(1, 'Blahblah'),(2, 'Blahblah'),(3, 'Blahbodyblah'),(4, 'Blahblahblah'),(4, 'Blahbodyblah'),(4, 'Blah'),(5, 'Blah'),(5, 'Blahblah'),(6, 'Blahblah');
-- use a CTE
with a as
(
select
p.postbody
, count(t.tagid) as TimesTagged
/* You stated you wanted a return of posts based on their occurrence. I am counting a position
of the COUNTS OF TAGID's descending (greatest first) starting from one. If you have a tie and want to
do those I would consider using DENSE_RANK. You would have to insert more values where you get a third
occurence to become a TIE to see how Rank, Dense_Rank, and Row_number differ. They all have their
purposes but the user should know what they want before determining which they use.
*/
, row_number() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst
, Rank() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst_Ranking
, Dense_Rank() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst_DenseRanking
from @Tag t
join @Posts p on t.tagid = p.tagid
group by p.postbody
)
select *
from a
-- I only use Row_Number, you can change to use one of the other predicates above if you wish.
where PositionOfCountsTaggedByGreatestOrderFirst <= 3
/*
You are stating you only want the top three counts
windowed functions are better than using top IMHO as you can specify lists 'in', medians, and all other types
explicitly defined rather than having to repeating nested selects. The only downer is you can not use
a predicate on a windowed function directly. Yout must create it and then in a nested select, CTE (as shown)
, a table variable, temp table, etc... define a predicate on it.
*/