Question

我有一张这样的表

TITLE          |   DESCRIPTION
------------------------------------------------
test1          |   value blah blah value
test2          |   value test
test3          |   test test test
test4          |   value test value test

如何只选择包含后续冗余字符串的行（＆＃34;等等等等#34;但不是＆＃34;等等等等等等？）

所需的输出应该只是

TITLE          |   DESCRIPTION
------------------------------------------------
test1          |   value blah blah value
test3          |   test test test

Answer 1

你可以为这个问题（和许多其他问题）创建一个帮助表（只有一次），它包含自然数。它可用于多种用途：

create table seq (num int);
insert into seq values (1),(2),(3),(4),(5),(6),(7),(8);
insert into seq select num+8  from seq;
insert into seq select num+16 from seq;
insert into seq select num+32 from seq;
insert into seq select num+64 from seq;
/* continue doubling the number of records until you feel you have enough */

然后，您可以在查询中加入该表，其中每个数字用作短语中单词的序列号。这样你就可以提取每个单词并将其与下一个单词进行比较：

select     title, description
from       phrases
where      description not in (
        select     description
        from       phrases p
        inner join seq 
                on seq.num <= length(p.description)
                            - length(replace(p.description,' ',''))
               and substring_index(substring_index(
                                   description, ' ', num), ' ', -1)
                   = substring_index(substring_index(
                                   description, ' ', num+1), ' ', -1)
        )

样本数据的输出是：

| title |           description |
|-------|-----------------------|
| test2 |            value test |
| test4 | value test value test |

SQL fiddle

mySQL：在VARCHAR字段中查找字符串的重复？

1 个答案: