我基本上有一个带有列的表。让我们调用“摘要”一栏
所以如果'Summary'看起来像这样。我去公园找狗。狗不在那里。我离开是因为没有狗。
我希望能够返回一个列表,它基本上会给我重复的单词以及它出现次数的命中数。我不知道哪个单词确实是重复的,所以我不能将其硬编码到SQL查询中。
我需要的结果是“Dog”-3,“The” - 2,“I” - 2
我无法发布图片,因此我无法发布表格
答案 0 :(得分:1)
这不一定是实现您正在寻找的结果的一种非常有效的方式,但这将输出指定summary
中计数为2或更多的单词列表:
DECLARE @summary NVARCHAR(MAX)
SET @summary = N'I went to the park to find a dog. The dog was not there. I left because there was no dog.'
SET NOCOUNT ON
DECLARE @PosA INT
DECLARE @Word NVARCHAR(MAX)
-- A temporary table to hold matches
CREATE TABLE dbo.#WordList
(
Word NVARCHAR(MAX),
WordCount INT
)
SET @PosA = 0
WHILE (LEN(@summary) > 0)
BEGIN
-- Find the position of the word end
SET @PosA = CHARINDEX(' ', @summary)
IF (@PosA = 0)
SET @PosA = LEN(@summary) + 1
-- Extract the word and shorten the summary text
SET @Word = SUBSTRING(@summary, 0, @PosA)
IF (@PosA < LEN(@summary))
SET @summary = SUBSTRING(@summary, @PosA + 1, LEN(@summary) - @PosA)
ELSE
SET @summary = ''
-- Strip punctuation
SET @Word = REPLACE(REPLACE(@Word, '.', ''), ',', '')
-- Add or create the word
IF EXISTS ( SELECT TOP 1 1 FROM dbo.#WordList WHERE Word = @Word)
UPDATE dbo.#WordList
SET WordCount = WordCount + 1
WHERE (Word = @Word)
ELSE
INSERT INTO dbo.#WordList (Word, WordCount)
VALUES (@Word, 1)
END
-- Get results
SELECT *
FROM dbo.#WordList
WHERE (WordCount > 1)
ORDER BY Word
--- Tidy up
DROP TABLE dbo.#WordList
有效地,按每个空格拆分摘要文本,然后从结果单词中删除标点符号。生成的单词存储在#WordList
临时表中,计数会根据需要递增。
最后结果将在最后返回。
请注意,您可能希望改进标点删除,因为我只为此答案添加了句号和逗号。
答案 1 :(得分:0)
我认为对于每一行,您需要将摘要列拆分为单独的行。然后,您可以对该结果集进行选择,计算每个值。这是一个很好的Split函数的链接: Split functions
它们很老,但仍然非常有效。我认为像tvf这样的东西会让你前进:
CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(@sep, @s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
答案 2 :(得分:0)
DECLARE @summaries TABLE (id int, summary nvarchar(max))
INSERT @summaries values
(1,N'I went to the park to find a dog. The dog was not there. I left because there was no dog.')
SELECT id, word, COUNT(*) c
FROM @summaries
CROSS APPLY (SELECT CAST('<a>'+REPLACE(summary,' ','</a><a>')+'</a>' AS xml) xml1 ) t1
CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2
GROUP BY id, word
HAVING COUNT(*) > 1