这个问题已经问了好几次了,但是我找不到我需要的具体答案。我有一个查询,该查询在SQL Server的列中查找最常见的单词,并列出它们的出现次数。问题是,如果一个单词连续出现多次,则每次出现都会计数一次。我想每行只对每个单词计数一次。
因此,行“ to be or not be”的值将“ to”和“ be”分别计数一次,而不是总频率的两次。
这是当前查询,它还会去除诸如代词之类的常见单词,并用空格替换所有常见的分隔符。它有点旧,所以我怀疑它可能会更整洁。
SELECT sep.Col Phrase, count(*) as Qty
FROM (
Select * FROM (
Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Title, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), ' ', ' '), ' ', ' '))))
FROM Table
) easyValues
Where value <> ''
) actualValues
Cross Apply dbo.SeparateValues(value, ' ') sep
WHERE sep.Col not in ('', 'THE', 'A', 'AN', 'WHO', 'BOOK', 'AND', 'FOR', 'ON', 'HAVE', 'YOUR', 'HOW', 'WE', 'IN', 'I', 'IT', 'BY', 'SO', 'THEIR', 'IS', 'OR', 'HE', 'OF', 'WHAT'
, 'HIM', 'HIS', 'SHE', 'HER', 'MY', 'FROM', 'US', 'OUR', 'AT', 'ALL', 'BE', 'OF', 'TO', 'YOU', 'WITH', 'THAT', 'THIS', 'WAS', 'ARE', 'THERE', 'BUT', 'HAS'
, '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'WILL', 'MORE', 'DIV', 'THAN', 'EACH', 'GET', 'ANY')
and LEN(sep.Col) > 2
GROUP By sep.Col
HAVING count(*) > 1
在解决重复单词的问题时,请您对任何更好的方法的想法表示赞赏。
答案 0 :(得分:2)
您只需两次GROUP BY
。
首先按sep.Col
和Table.ID
删除一行中的重复项。您的表上有一些ID
列,对吧?
其次,只需sep.Col
即可获得最终计数。
我还使用CTE重写了您的查询以使其可读。至少对我来说,这种方式更具可读性。
WITH
easyValues
AS
(
Select
ID
,value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Title, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), ' ', ' '), ' ', ' '))))
FROM Table
)
,actualValues
AS
(
SELECT
ID
,Value
FROM easyValues
Where value <> ''
)
,SeparateValues
AS
(
SELECT
ID
,sep.Col
FROM
actualValues
Cross Apply dbo.SeparateValues(value, ' ') AS sep
WHERE
sep.Col not in ('', 'THE', 'A', 'AN', 'WHO', 'BOOK', 'AND', 'FOR', 'ON', 'HAVE', 'YOUR', 'HOW', 'WE', 'IN', 'I', 'IT', 'BY', 'SO', 'THEIR', 'IS', 'OR', 'HE', 'OF', 'WHAT'
, 'HIM', 'HIS', 'SHE', 'HER', 'MY', 'FROM', 'US', 'OUR', 'AT', 'ALL', 'BE', 'OF', 'TO', 'YOU', 'WITH', 'THAT', 'THIS', 'WAS', 'ARE', 'THERE', 'BUT', 'HAS'
, '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'WILL', 'MORE', 'DIV', 'THAN', 'EACH', 'GET', 'ANY')
and LEN(sep.Col) > 2
)
,UniqueValues
AS
(
SELECT
ID, Col
FROM
SeparateValues
GROUP BY
ID, Col
)
SELECT
Col AS Phrase
,count(*) as Qty
FROM UniqueValues
GROUP By Col
HAVING count(*) > 1
;
答案 1 :(得分:1)
要满足您的要求,您可以使用FUNCTION通过定界符''空格将字符串分成单词列表。借助此功能,您可以随后使用一些动态SQL(例如游标)来获取最终计数。
首先将FUNCTION创建为- 代码源:stackoverflow
CREATE FUNCTION dbo.splitstring ( @stringToSplit VARCHAR(MAX) )
RETURNS @returnList TABLE ([Word] [nvarchar] (500))
AS
BEGIN
DECLARE @name NVARCHAR(255)
DECLARE @pos INT
WHILE CHARINDEX(' ', @stringToSplit) > 0
BEGIN
SELECT @pos = CHARINDEX(' ', @stringToSplit)
SELECT @name = SUBSTRING(@stringToSplit, 1, @pos-1)
INSERT INTO @returnList
SELECT @name
SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+1, LEN(@stringToSplit)-@pos)
END
INSERT INTO @returnList
SELECT @stringToSplit
RETURN
END
然后使用此CURSOR脚本获取最终输出-
DECLARE @Value VARCHAR(MAX)
DECLARE @WordList TABLE
(
Word VARCHAR(200)
)
DECLARE db_cursor CURSOR
FOR
SELECT Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace
(Replace(Replace(Replace(Replace(Replace(Replace(Replace
(Replace(Replace(title, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' ')
, '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), ' ', ' '), ' ', ' ')))) [Value]
FROM table
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @Value
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @WordList
SELECT DISTINCT Word FROM [dbo].[splitstring](@Value)
WHERE Word NOT IN ('', 'THE', 'A', 'AN', 'WHO', 'BOOK', 'AND', 'FOR', 'ON', 'HAVE', 'YOUR', 'HOW', 'WE', 'IN', 'I', 'IT', 'BY', 'SO', 'THEIR', 'IS', 'OR', 'HE', 'OF', 'WHAT'
, 'HIM', 'HIS', 'SHE', 'HER', 'MY', 'FROM', 'US', 'OUR', 'AT', 'ALL', 'BE', 'OF', 'TO', 'YOU', 'WITH', 'THAT', 'THIS', 'WAS', 'ARE', 'THERE', 'BUT', 'HAS'
, '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'WILL', 'MORE', 'DIV', 'THAN', 'EACH', 'GET', 'ANY')
AND LEN(Word) > 2
FETCH NEXT FROM db_cursor INTO @Value
END
CLOSE db_cursor
DEALLOCATE db_cursor
SELECT Word,COUNT(*)
FROM @WordList
GROUP BY Word
答案 2 :(得分:1)
据我所知,STRING_SPLIT函数与CROSS APPLY一起可以为您提供所需的内容。您可以根据空格分隔符分割字符串,分别选择每个单词,然后计算外部查询。我省略了为简洁起见没有选择特定单词的部分。
CREATE TABLE phrases(phrase NVARCHAR(MAX));
INSERT INTO phrases(phrase)VALUES(N'To be or not to be'),(N'this is not a phrase'),(N'And why is this not another one');
SELECT
w.value,
COUNT(*)
FROM
phrases AS p
CROSS APPLY (
SELECT DISTINCT
value
FROM
STRING_SPLIT(p.phrase,N' ')
) AS w
GROUP BY
w.value;