如何在SQL中查找单元格的重复单词

时间:2009-06-11 11:28:12

标签: mysql

我有一个名为“situation”和“entityid”的列。

Entityid    Situation
1234        In the the world of of
3456        Total universe is is a

任何人都可以给我查询以找到这些类型的高级词语。

由于 拉梅什

2 个答案:

答案 0 :(得分:1)

如果你想硬编码:

select EntityID, Situation
from Entity
where Situation like '%the the%'
or Situation like '%of of%'
or Situation like '%is is%'

更新:这是一种稍微不那么硬编码的方法:

select EntityID, Situation, right(s2, diff * 2 + 1) as RepeatedWords
from (
    select EntityID, Situation, WordNumber,
        substring_index(Situation, ' ', WordNumber) s1,
        substring_index(Situation, ' ', WordNumber + 1) s2,
        length(substring_index(Situation, ' ', WordNumber + 1)) - length(substring_index(Situation, ' ', WordNumber)) -1 diff
    from `Entity` e
    inner join (
        select 1 as WordNumber
        union all
        select 2 
        union all
        select 3 
        union all
        select 4 
        union all
        select 5 
        union all
        select 6 
        union all
        select 7 
        union all
        select 8 
        union all
        select 9 
        union all
        select 10 
    ) n
) a
where right(s1, diff) = right(s2, diff)
    and diff > 0
order by EntityID, WordNumber

它将搜索前10个单词左右,并且不会正确处理大小写,标点符号或多个空格,但它应该让您了解可以采取的方法。如果您希望它处理更长的字符串,只需继续添加到UNION ALL语句。

答案 1 :(得分:0)

如果您愿意使用SQL Server Express,您将能够创建CLR用户定义函数。

http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx

然后,您将掌握正则表达式的强大功能。

然后,根据您对RegEx的熟练程度,您可能会遇到零问题或两个问题。