使用TSQL,如何在给定的术语之前和之后查找单词和分组?

时间:2012-04-03 01:25:45

标签: sql sql-server tsql sql-server-2008-r2 common-table-expression

给定一个特定的单词模式(比如说“气球”),我想找到前后n个单词的数量,按照它们分组,并在表格的标题中存在一个计数

例如,如果数据集是:

  • 红气球天空
  • 黄色气球天空之路
  • 蓝色气球椅

我希望结果如下:

- red balloon | 1
- yellow balloon | 1
- blue balloon | 1
- balloon sky | 2
- balloon chair | 1

我认为实现这一目标的最佳方法是使用我的sproc中的正则表达式。所以,我添加了列出hereFindWordsInContext函数的强大的正则表达式函数。

首先:

WITH Words_CTE (Title)
AS
-- Define the CTE query.
(
    SELECT Title
    FROM ItemData
    WHERE Title LIKE '%balloon%'
)
-- Define the outer query referencing the CTE name.
SELECT Title
FROM Words_CTE

所以我想我会从那开始并将FindWordsInContext函数放入混合中,然后在给定单词之前对单词/进行分组。

- 更新 -

感谢下面的Adrian Iftode ......但代码并没有完全符合我的要求。

declare @table table(Sentence varchar(250))

insert into @table(sentence)
    values ('I have another red balloon in the car.'),
            ('Here is a new balloon for you.'),
            ('A red balloon is in the other room.'),
            ('Is there another balloon for me?')


select TOP(5) SentencePart, NumberOfWords
from @table
cross apply dbo.fnGetPartsFromSentence(Sentence, 'balloon') f
order by
  NumberOfWords DESC,
  case when f.Side = 'R' then 0
  else 1 end

输出:

balloon is in the other room.       5
I have another red balloon          4
Here is a new balloon               4
Is there another balloon            3
balloon in the car.                 3

我希望能够在“气球”两侧设置范围。在这种情况下,让我们说一个词,输出应该是:

red balloon      2
new balloon      1
another balloon  1
balloon in       1
balloon for      2
balloon is       1

1 个答案:

答案 0 :(得分:0)

有点很多代码,我会尝试解释

首先我使用了分割函数,将用给定的varchar

分割varchar
CREATE FUNCTION [dbo].[fnSplitString](@str NVARCHAR(MAX),@sep NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN
    WITH a AS(
        SELECT CAST(0 AS BIGINT) AS idx1,
               CHARINDEX(@sep,@str) idx2, 
               1 as [Level]
        UNION ALL
        SELECT idx2 + coalesce(nullif(LEN(@sep),0),1),
               CHARINDEX(@sep,@str, idx2 + 1), 
               [Level] + 1 as [Level]
        FROM a
        WHERE idx2 > 0
    )
    SELECT SUBSTRING(@str,idx1,COALESCE(NULLIF(idx2,0),LEN(@str)+1)-idx1) AS Value, 
           [Level], 
           case when idx1 = 0 then 'R' when idx2 != 0 then 'LR' else 'L' end as Side
    FROM a  

鉴于varchar 'red balloon sky',当split是空格字符时,它将输出:

select *
from dbo.fnSplitString('red balloon sky', ' ')

Value   Level   Side
red      1       R
balloon  2       LR
sky      3       L

Side部分表示:如果R则空格位于单词的右侧,如果L则空格位于单词的左侧,如果是LR,则单词被空格包围。

当拆分为'气球'时

select *
from dbo.fnSplitString('red balloon sky', 'balloon')

red     1   R
 sky    2   L

所以气球出现在红色的右侧,并出现在 sky 的左侧

有了这个有用的功能,我创建了另一个函数,它将输出单个句子所需的格式(varchar)

create FUNCTION [dbo].[fnGetPartsFromSentence](@sentence NVARCHAR(MAX),@word NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN


with RawData as
(select rtrim(ltrim(f.Value)) as LR, 
       (select COUNT (*) from dbo.fnSplitString(rtrim(ltrim(f.Value)), ' ')) as NumberOfWords,
       f.Side,
       0 as SideLevel
from dbo.fnSplitString(@sentence, @word) as f
where f.Side = 'R' or f.Side = 'L'
union all
(
    select rtrim(ltrim(f.Value)) as LR, 
       (select COUNT (*) from dbo.fnSplitString(rtrim(ltrim(f.Value)), ' ')) as NumberOfWords,
       f.Side,
       sl.no as SideLevel
    from dbo.fnSplitString(@sentence, @word) as f
    join (select 1 as no union all select 2) sl on 1 = 1
    where f.Side = 'LR'
)
)
select (case when Side = 'R' then LR + ' ' + @word 
             when Side = 'L' then @word + ' ' + LR
             when Side = 'LR' then  
                    (
                        case when SideLevel  = 1 then @word + ' ' + LR
                        when SideLevel  = 2 then LR + ' ' + @word 
                        end
                    )
            end) as SentencePart,
        (case when Side = 'R' or Side = 'L' then Side
              else           
                   (    case when SideLevel  = 1 then 'L'
                        when SideLevel  = 2 then 'R'
                        end
                    )
            end) as Side,
        NumberOfWords           
from RawData

此功能使用前一个功能。首先,它逐字逐句地分割,并通过按空格进行另一次分割来计算分裂中的单词。当分割的两侧出现一个单词时,它会重复分割(以1,2值连接)。

此函数还将输出与单词连接的拆分,具体取决于它的哪一侧:左侧,右侧或两者。它也会输出Side,这次是左或右。

select *
from [dbo].[fnGetPartsFromSentence]('yellow balloon sky road','balloon')

SentencePart        Side    NumberOfWords

yellow balloon      R           1
balloon sky road    L           2

现在使用此功能,我可以将其与表格交叉应用

declare @table table(Sentence varchar(250))

insert into @table(sentence)
    values ('red balloon sky'),
            ('yellow balloon sky road'),
            ('blue balloon chair')


select SentencePart, NumberOfWords
from @table
cross apply dbo.fnGetPartsFromSentence(Sentence, 'balloon') f
order by
  case when f.Side = 'R' then 0
  else 1 end

输出

red balloon           1
yellow balloon        1
blue balloon          1
balloon chair         1
balloon sky road      2
balloon sky           1

可以在多次出现时工作