我有一个新闻文章的SQL表,每篇文章都可以出现在几个类别中。遗憾的是,这些类别已存储为每行的单个varchar中连接的文本值。
我想在每个类别中保留前5个新闻文章,并删除较旧的文章。我不认为没有程序代码(SQL循环/游标,或者知道所有可能的类别名称的外部程序重复调用SQL)。
这是我的测试数据,没有新闻文章标题/内容。我相信代码首先需要删除不需要的类别字符串,然后删除已删除所有类别的所有行。
declare @News table(ArticleId INTEGER NOT NULL, DateAdded SMALLDATETIME NOT NULL, Categories VARCHAR(250) NOT NULL)
insert into @News values (11, '2014-01-11', 'SPORT~CELEBS~')
insert into @News values (10, '2014-01-10', 'SPORT~CELEBS~POLITICS~')
insert into @News values (9, '2014-01-09', 'SPORT~CELEBS~')
insert into @News values (8, '2014-01-08', 'SPORT~NATURE~')
insert into @News values (7, '2014-01-07', 'SPORT~CELEBS~')
insert into @News values (6, '2014-01-06', 'SPORT~CELEBS~POLITICS~') --ought to have SPORT label removed
insert into @News values (5, '2014-01-05', 'POLITICS~')
insert into @News values (4, '2014-01-04', 'POLITICS~')
insert into @News values (3, '2014-01-03', 'POLITICS~')
insert into @News values (2, '2014-01-02', 'POLITICS~') --ought to get deleted
insert into @News values (1, '2014-01-01', 'CELEBS~') --ought to get deleted
--magic happens
delete from @News where Categories = ''
select * from @News order by DateAdded desc
如果唯一的解决方案是使用WHILE
或CURSOR
,那么我将选择将SQL包装在存储过程中,并使用值'SPORT~'重复调用它,然后'CELEBS~ '然后'政治〜'等。
答案 0 :(得分:0)
我已经找到了一个部分(并且非常不优雅)的解决方案。方法是重新创建可能存在的“粘合”表,如果这是一个理智的数据库(尽管正确的表将有两个FK)。
--create list of all possible category values (get first category from every row, then second, then third, etc)
declare @Category table (SingleCategory VARCHAR(50))
insert into @Category
select distinct LEFT(SingleCategory, charindex('~', SingleCategory))
from (
select categories as SingleCategory from @News
union
select SUBSTRING(categories, charindex('~', categories)+1, 100) from @News where Categories like '%~%~'
union
select SUBSTRING(categories, charindex('~', categories, charindex('~', categories)+1)+1, 100) from @News where Categories like '%~%~%~'
--repeat if 4 and 5 occurances possible, etc
) sq
--create a 'glue' table
declare @Glue table(ArticleId INT NOT NULL, DateAdded SMALLDATETIME NOT NULL, Category VARCHAR(50) NOT NULL)
insert into @Glue
select articleid, dateadded, SingleCategory
from @News n
inner join @Category c on n.categories LIKE '%' + c.SingleCategory + '%'
--use the glue table to identify the articles we do want, and delete all the others
delete from @News where ArticleId not in (
SELECT articleid
FROM (
SELECT articleid, Category,
RANK() OVER(PARTITION BY Category ORDER BY dateadded DESC) AS RankThem
FROM @Glue
) sq
WHERE RankThem <= 5
)
这摆脱了我们不想要的两行,但我们最终在SPORT类别中有6篇文章,所以它不是一个完美的解决方案。还有更好的方法吗?