我有一个名为“situation”和“entityid”的列。
Entityid Situation 1234 In the the world of of 3456 Total universe is is a
任何人都可以给我查询以找到这些类型的高级词语。
由于 拉梅什
答案 0 :(得分:1)
如果你想硬编码:
select EntityID, Situation
from Entity
where Situation like '%the the%'
or Situation like '%of of%'
or Situation like '%is is%'
更新:这是一种稍微不那么硬编码的方法:
select EntityID, Situation, right(s2, diff * 2 + 1) as RepeatedWords
from (
select EntityID, Situation, WordNumber,
substring_index(Situation, ' ', WordNumber) s1,
substring_index(Situation, ' ', WordNumber + 1) s2,
length(substring_index(Situation, ' ', WordNumber + 1)) - length(substring_index(Situation, ' ', WordNumber)) -1 diff
from `Entity` e
inner join (
select 1 as WordNumber
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
union all
select 10
) n
) a
where right(s1, diff) = right(s2, diff)
and diff > 0
order by EntityID, WordNumber
它将搜索前10个单词左右,并且不会正确处理大小写,标点符号或多个空格,但它应该让您了解可以采取的方法。如果您希望它处理更长的字符串,只需继续添加到UNION ALL语句。
答案 1 :(得分:0)
如果您愿意使用SQL Server Express,您将能够创建CLR用户定义函数。
http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx
然后,您将掌握正则表达式的强大功能。
然后,根据您对RegEx的熟练程度,您可能会遇到零问题或两个问题。