给定一个特定的单词模式(比如说“气球”),我想找到前后n个单词的数量,按照它们分组,并在表格的标题中存在一个计数
例如,如果数据集是:
我希望结果如下:
- red balloon | 1
- yellow balloon | 1
- blue balloon | 1
- balloon sky | 2
- balloon chair | 1
我认为实现这一目标的最佳方法是使用我的sproc中的正则表达式。所以,我添加了列出here和FindWordsInContext
函数的强大的正则表达式函数。
首先:
WITH Words_CTE (Title)
AS
-- Define the CTE query.
(
SELECT Title
FROM ItemData
WHERE Title LIKE '%balloon%'
)
-- Define the outer query referencing the CTE name.
SELECT Title
FROM Words_CTE
所以我想我会从那开始并将FindWordsInContext函数放入混合中,然后在给定单词之前对单词/进行分组。
- 更新 -
感谢下面的Adrian Iftode ......但代码并没有完全符合我的要求。
declare @table table(Sentence varchar(250))
insert into @table(sentence)
values ('I have another red balloon in the car.'),
('Here is a new balloon for you.'),
('A red balloon is in the other room.'),
('Is there another balloon for me?')
select TOP(5) SentencePart, NumberOfWords
from @table
cross apply dbo.fnGetPartsFromSentence(Sentence, 'balloon') f
order by
NumberOfWords DESC,
case when f.Side = 'R' then 0
else 1 end
输出:
balloon is in the other room. 5
I have another red balloon 4
Here is a new balloon 4
Is there another balloon 3
balloon in the car. 3
我希望能够在“气球”两侧设置范围。在这种情况下,让我们说一个词,输出应该是:
red balloon 2
new balloon 1
another balloon 1
balloon in 1
balloon for 2
balloon is 1
答案 0 :(得分:0)
有点很多代码,我会尝试解释
首先我使用了分割函数,将用给定的varchar
分割varcharCREATE FUNCTION [dbo].[fnSplitString](@str NVARCHAR(MAX),@sep NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN
WITH a AS(
SELECT CAST(0 AS BIGINT) AS idx1,
CHARINDEX(@sep,@str) idx2,
1 as [Level]
UNION ALL
SELECT idx2 + coalesce(nullif(LEN(@sep),0),1),
CHARINDEX(@sep,@str, idx2 + 1),
[Level] + 1 as [Level]
FROM a
WHERE idx2 > 0
)
SELECT SUBSTRING(@str,idx1,COALESCE(NULLIF(idx2,0),LEN(@str)+1)-idx1) AS Value,
[Level],
case when idx1 = 0 then 'R' when idx2 != 0 then 'LR' else 'L' end as Side
FROM a
鉴于varchar 'red balloon sky',当split是空格字符时,它将输出:
select *
from dbo.fnSplitString('red balloon sky', ' ')
Value Level Side
red 1 R
balloon 2 LR
sky 3 L
Side部分表示:如果R则空格位于单词的右侧,如果L则空格位于单词的左侧,如果是LR,则单词被空格包围。
当拆分为'气球'时
select *
from dbo.fnSplitString('red balloon sky', 'balloon')
red 1 R
sky 2 L
所以气球出现在红色的右侧,并出现在 sky 的左侧
有了这个有用的功能,我创建了另一个函数,它将输出单个句子所需的格式(varchar)
create FUNCTION [dbo].[fnGetPartsFromSentence](@sentence NVARCHAR(MAX),@word NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN
with RawData as
(select rtrim(ltrim(f.Value)) as LR,
(select COUNT (*) from dbo.fnSplitString(rtrim(ltrim(f.Value)), ' ')) as NumberOfWords,
f.Side,
0 as SideLevel
from dbo.fnSplitString(@sentence, @word) as f
where f.Side = 'R' or f.Side = 'L'
union all
(
select rtrim(ltrim(f.Value)) as LR,
(select COUNT (*) from dbo.fnSplitString(rtrim(ltrim(f.Value)), ' ')) as NumberOfWords,
f.Side,
sl.no as SideLevel
from dbo.fnSplitString(@sentence, @word) as f
join (select 1 as no union all select 2) sl on 1 = 1
where f.Side = 'LR'
)
)
select (case when Side = 'R' then LR + ' ' + @word
when Side = 'L' then @word + ' ' + LR
when Side = 'LR' then
(
case when SideLevel = 1 then @word + ' ' + LR
when SideLevel = 2 then LR + ' ' + @word
end
)
end) as SentencePart,
(case when Side = 'R' or Side = 'L' then Side
else
( case when SideLevel = 1 then 'L'
when SideLevel = 2 then 'R'
end
)
end) as Side,
NumberOfWords
from RawData
此功能使用前一个功能。首先,它逐字逐句地分割,并通过按空格进行另一次分割来计算分裂中的单词。当分割的两侧出现一个单词时,它会重复分割(以1,2值连接)。
此函数还将输出与单词连接的拆分,具体取决于它的哪一侧:左侧,右侧或两者。它也会输出Side,这次是左或右。
select *
from [dbo].[fnGetPartsFromSentence]('yellow balloon sky road','balloon')
SentencePart Side NumberOfWords
yellow balloon R 1
balloon sky road L 2
现在使用此功能,我可以将其与表格交叉应用
declare @table table(Sentence varchar(250))
insert into @table(sentence)
values ('red balloon sky'),
('yellow balloon sky road'),
('blue balloon chair')
select SentencePart, NumberOfWords
from @table
cross apply dbo.fnGetPartsFromSentence(Sentence, 'balloon') f
order by
case when f.Side = 'R' then 0
else 1 end
输出
red balloon 1
yellow balloon 1
blue balloon 1
balloon chair 1
balloon sky road 2
balloon sky 1
可以在多次出现时工作