如何返回表中重复单词列表和实例计数

时间:2014-02-24 20:48:28

标签: sql-server tsql

我基本上有一个带有列的表。让我们调用“摘要”一栏

所以如果'Summary'看起来像这样。我去公园找狗。狗不在那里。我离开是因为没有狗。

我希望能够返回一个列表,它基本上会给我重复的单词以及它出现次数的命中数。我不知道哪个单词确实是重复的,所以我不能将其硬编码到SQL查询中。

我需要的结果是“Dog”-3,“The” - 2,“I” - 2

我无法发布图片,因此我无法发布表格

3 个答案:

答案 0 :(得分:1)

这不一定是实现您正在寻找的结果的一种非常有效的方式,但这将输出指定summary中计数为2或更多的单词列表:

DECLARE @summary NVARCHAR(MAX)
SET @summary = N'I went to the park to find a dog. The dog was not there. I left because there was no dog.'

SET NOCOUNT ON

DECLARE @PosA   INT
DECLARE @Word   NVARCHAR(MAX)

-- A temporary table to hold matches
CREATE TABLE dbo.#WordList
(
    Word        NVARCHAR(MAX),
    WordCount   INT
)

SET @PosA = 0
WHILE (LEN(@summary) > 0)
BEGIN
    -- Find the position of the word end
    SET @PosA = CHARINDEX(' ', @summary)
    IF (@PosA = 0)
        SET @PosA = LEN(@summary) + 1

    -- Extract the word and shorten the summary text
    SET @Word = SUBSTRING(@summary, 0, @PosA)
    IF (@PosA < LEN(@summary))
        SET @summary = SUBSTRING(@summary, @PosA + 1, LEN(@summary) - @PosA)
    ELSE
        SET @summary = ''

    -- Strip punctuation
    SET @Word = REPLACE(REPLACE(@Word, '.', ''), ',', '')

    -- Add or create the word
    IF EXISTS ( SELECT TOP 1 1 FROM dbo.#WordList WHERE Word = @Word)
        UPDATE  dbo.#WordList
          SET   WordCount = WordCount + 1
          WHERE (Word = @Word)
    ELSE
        INSERT INTO dbo.#WordList (Word, WordCount)
          VALUES (@Word, 1)
END

-- Get results
SELECT  *
  FROM  dbo.#WordList
  WHERE (WordCount > 1)
  ORDER BY Word

--- Tidy up
DROP TABLE dbo.#WordList

有效地,按每个空格拆分摘要文本,然后从结果单词中删除标点符号。生成的单词存储在#WordList临时表中,计数会根据需要递增。

最后结果将在最后返回。

请注意,您可能希望改进标点删除,因为我只为此答案添加了句号和逗号。

答案 1 :(得分:0)

我认为对于每一行,您需要将摘要列拆分为单独的行。然后,您可以对该结果集进行选择,计算每个值。这是一个很好的Split函数的链接: Split functions

它们很老,但仍然非常有效。我认为像tvf这样的东西会让你前进:

CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
    FROM Pieces
  )

答案 2 :(得分:0)

DECLARE @summaries TABLE (id int, summary nvarchar(max))
INSERT @summaries values
  (1,N'I went to the park to find a dog. The dog was not there. I left because there was no dog.')

SELECT id, word, COUNT(*) c
FROM @summaries
CROSS APPLY (SELECT CAST('<a>'+REPLACE(summary,' ','</a><a>')+'</a>' AS xml) xml1 ) t1
CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2
GROUP BY id, word
HAVING COUNT(*) > 1