我有一个MSSQL 2005表:
[Companies](
[CompanyID] [int] IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](128),
[Description] [nvarchar](256),
[Keywords] [nvarchar](256)
)
我想为这些公司生成标签云。但我已将所有关键字保存在一个以逗号分隔的列中。有关如何通过最常用的关键字生成标签云的任何建议。每家公司可能有数百万家公司大约有十个关键字。
谢谢。
答案 0 :(得分:4)
步骤1:将关键字分成适当的关系(表格)。
CREATE TABLE Keywords (KeywordID int IDENTITY(1,1) NOT NULL
, Keyword NVARCHAR(256)
, constraint KeywordsPK primary key (KeywordID)
, constraint KeywordsUnique unique (Keyword));
第2步:将公司和标签之间的多对多关系映射到一个单独的表中,就像所有多对多关系一样:
CREATE TABLE CompanyKeywords (
CompanyID int not null
, KeywordID int not null
, constraint CompanyKeywords primary key (KeywordID, CompanyID)
, constraint CompanyKeyword_FK_Companies
foreign key (CompanyID)
references Companies(CompanyID)
, constraint CompanyKeyword_FK_Keywords
foreign key (KeywordID)
references Keywords (KeywordID));
步骤3:使用简单的GROUP BY查询生成“云”(例如,将“云”视为最常见的100个标记):
with cte as (
SELECT TOP 100 KeywordID, count(*) as Count
FROM CompanyKeywords
group by KeywordID
order by count(*) desc)
select k.Keyword, c.Count
from cte c
join Keyword k on c.KeywordID = k.KeywordID;
步骤4:缓存结果,因为它很少变化,并且计算成本很高。
答案 1 :(得分:1)
我更愿意将您的设计标准化为suggested by Remus,但如果您处于无法改变设计的地步......
您可以使用解析功能(我将使用的示例来自here)来解析您的关键字并计算它们。
CREATE FUNCTION [dbo].[fnParseStringTSQL] (@string NVARCHAR(MAX),@separator NCHAR(1))
RETURNS @parsedString TABLE (string NVARCHAR(MAX))
AS
BEGIN
DECLARE @position int
SET @position = 1
SET @string = @string + @separator
WHILE charindex(@separator,@string,@position) <> 0
BEGIN
INSERT into @parsedString
SELECT substring(@string, @position, charindex(@separator,@string,@position) - @position)
SET @position = charindex(@separator,@string,@position) + 1
END
RETURN
END
go
create table MyTest (
id int identity,
keywords nvarchar(256)
)
insert into MyTest
(keywords)
select 'sql server,oracle,db2'
union
select 'sql server,oracle'
union
select 'sql server'
select k.string, COUNT(*) as count
from MyTest mt
cross apply dbo.fnParseStringTSQL(mt.keywords,',') k
group by k.string
order by count desc
drop function dbo.fnParseStringTSQL
drop table MyTest
答案 2 :(得分:1)
Remus和Joe都是正确的,但是正如Joe所说,如果你没有选择,那么你必须忍受它。我想我可以通过使用XML数据类型为您提供简单的解决方案。您可以通过执行此查询轻松查看已解析的列
WITH myCommonTblExp AS (
SELECT CompanyID,
CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
FROM Companies
)
SELECT CompanyID, RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)
现在知道你可以做到这一点,你所要做的就是对它们进行分组和计数,但你不能对XML方法进行分组,所以我的建议是创建一个上面的查询视图
CREATE VIEW [dbo].[DissectedKeywords]
AS
WITH myCommonTblExp AS (
SELECT
CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
FROM Companies
)
SELECT RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)
GO
并对该视图执行计数
SELECT Keywords, COUNT(*) AS KeyWordCount FROM DissectedKeywords
GROUP BY Keywords
ORDER BY Keywords
无论如何,这里是完整的文章 - &gt; http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/