拆分逗号分隔字符串并选择按计数排序的顶部“SearchTags”

时间:2014-08-20 20:20:18

标签: c# sql sql-server linq linq-to-sql

我正在使用以下数据集:

ID  SearchTags
1   Cats,Birds,Dogs,Snakes,Roosters
2   Mice,Chickens,Cats,Lizards
3   Birds,Zebras,Sheep,Horses,Monkeys,Chimps
4   Lions,Tigers,Bears,Chickens
5   Cats,Goats,Pandas
6   Birds,Zebras,Sheep,Horses
7   Rats,Dogs,Hawks,Eagles,Tigers
8   Cats,Tigers,Dogs,Pandas
9   Dogs,Beavers,Sharks,Vultures
10  Cats,Bears,Bats,Leopards,Chickens

我需要查询最受欢迎的SearchTag列表。

我有一个查询,它将返回最受欢迎的SearchTag,但它会返回整个单词列表。 (我预料到的)。是否可以在(,)上拆分SearchTags列并生成最常用标签的列表,以便最终得到如下列表/计数?:

Cats    5
Dogs    4
Chickens    3
Tigers  3
Bears   2
Sharks  1
etc...

而不是我现在得到的:

Cats,Birds,Dogs,Snakes,Roosters 1
Dogs,Beavers,Sharks,Vultures    1
Cats,Bears,Bats,Leopards,Chickens 1
etc...

这是返回单词列表的查询。

SELECT SearchTags, COUNT(*) AS TagCount
FROM Animals
GROUP BY SearchTags
ORDER BY TagCount DESC

我正在使用SQL Server。我更喜欢查询,但如果需要可以创建存储过程。

感谢您提供的任何帮助。

2 个答案:

答案 0 :(得分:2)

您已使用C#和LINQ标记了问题,如果您拥有DataTable中的数据,则可以执行以下操作:

DataTable dt = GetDataTableFromDB();
var query = dt.AsEnumerable()
               .Select(r => r.Field<string>("SearchTags").Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });

如果你设置了LINQ TO SQL,那么你可以这样做:

var query = db.YourTable
               .Select(r=> r.SearchTags)
               .AsEnumerable()
               .Where(r=> !string.IsNullOrWhiteSpace(r))
               .Select(r => r.Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });

           });

这会将所有SearchTags加载到内存中,然后您就可以应用Split

您还可以在数据库端过滤出SearchTags的空字符串值或空字符串值,如:

var query = db.YourTable
               .Where(r=> r.SearchTags != null && r.SearchTags.Trim() != "")
               .Select(r=> r.SearchTags)
               .AsEnumerable()
               .Select(r => r.Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });

           });

上面将从数据库端的返回集合中过滤掉空字符串或空字符串/仅空格,并且可以更有效地工作。

过滤掉日期:

DateTime dt = DateTime.Today.AddDays(-14);
var query = db.YourTable
               .Where(r=> r.SearchTags != null && 
                      r.SearchTags.Trim() != "" &&
                      r.MediaDate >= dt)
               .Select(r=> r.SearchTags)
               .AsEnumerable()
               .Select(r => r.Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });

           });

答案 1 :(得分:0)

假设你想要TSQL ......

有许多用于拆分字符串的TSQL函数,但使用XQuery的任何东西都是最快的,而不是过多的循环函数。

我在具有10-15K CSV值的表格上的生产系统中使用与此类似的东西,并且它在几秒钟内运行,而旧的循环函数有时需要一分钟。

无论如何,这是一个让你前进的快速演示。

DECLARE @DATA TABLE (ID INT, SEARCHTAGS VARCHAR(100))
INSERT INTO @DATA
SELECT 1,'Cats,Birds,Dogs,Snakes,Roosters' UNION ALL
SELECT 2,'Mice,Chickens,Cats,Lizards' UNION ALL
SELECT 3,'Birds,Zebras,Sheep,Horses,Monkeys,Chimps' UNION ALL
SELECT 4,'Lions,Tigers,Bears,Chickens' UNION ALL
SELECT 5,'Cats,Goats,Pandas' UNION ALL
SELECT 6,'Birds,Zebras,Sheep,Horses' UNION ALL
SELECT 7,'Rats,Dogs,Hawks,Eagles,Tigers' UNION ALL
SELECT 8,'Cats,Tigers,Dogs,Pandas' UNION ALL
SELECT 9,'Dogs,Beavers,Sharks,Vultures' UNION ALL
SELECT 10,'Cats,Bears,Bats,Leopards,Chickens'

;WITH TagList AS
(
SELECT ID, Split.a.value('.', 'VARCHAR(max)') AS String
FROM  (SELECT ID, 
              CAST ('<M>' + REPLACE(CAST(SEARCHTAGS AS VARCHAR), ',', '</M><M>') + '</M>' AS XML) AS String  
       FROM @DATA) AS A 
CROSS APPLY String.nodes ('/M') AS Split(a)
)

SELECT TOP (10) String, COUNT(*) AS [SearchCount]
FROM TagList
GROUP BY String
ORDER BY [SearchCount] DESC

注意:如果你能在c#中处理它,那么与字符串操作有关的几乎总是更快......所以Habib的答案可能比TSQL解决方案更有效。