Question

我写了一个应该很简单的SQL查询，事实并非如此简单。我有一个120万字（几种语言）+更多的数据库。我的内心问我用字母jxtehmrungce中的5个字母可以说出多少单词。然后我决定进行测试。好吧，事实证明，编写这样的查询很容易。但是！〜必须有一个更简单的解决方案？字符越多，查询的时间就越长。

下面，按字母顺序循环显示所有字符（字母）

SELECT count(DISTINCT `word`) as `numrows` FROM `words` WHERE LENGTH(`word`) = '5' AND `chars` REGEXP ' ([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])' AND `verified` = '1'

我会在yougowords.com上使用这个用于针对390万行表的解码工具，如果它运行良好，但这是一个非常耗时的查询。我怎样才能改善这个？可能有几个正则表达式，但是，如果您将字符集更改为具有双字母或三字母的字符，例如添加额外的j，g，h或添加更多字母等.jjtehhmrungcs

修改 - 没有重复的字符，因此为什么你会看到2个字符，而不是2个字符。（jxtehmrungce）

{0,1}是最小设置到最大值，因为重复的字符可以有多于1个，并且您可以通过仅使用其中一个重复的字母或两者来制作一个5个字母的单词。 {0,1}可以写成{1,2} - 但是，还需要设置最大可能的字母数量。这个词不能有3个，因为jxtehmrungce只有2个。

我没有SQL经验，我的基础是我自己有限的知识。

字符列 对于不同的程序，我创建了字符列，用于字母中所有字母的字母组织。因此，“生命”这个词的顺序就是efil，而“happy”这个词就会变得更加健康。我可以使用其中一个或列来获得与此查询相同的结果，但是chars列按顺序放置字符，因此jxtehmrungce变为ceeghjmnrtux。可以帮助找到只有“2”的单词吗？

Answer 1

这样做你想要的吗？

select count(distinct word)
from words w
where word regex '[jxtehmrungce]{5}' and verified = '1';

或者您正在寻找五个角色的排列吗？

编辑：

如果您仅限于列表中的字符，那么您的查询会更复杂。我会采用生成所有可能组合的方法，然后查看它们是否是单词：

create view i_c as
   select 1 as i, 'j' as c union all
   select 2 as i, 'x' as c union all
   select 3 as i, 't' as c union all
   select 4 as i, 'e' as c union all
   select 5 as i, 'h' as c union all
   select 6 as i, 'm' as c union all
   select 7 as i, 'r' as c union all
   select 8 as i, 'u' as c union all
   select 9 as i, 'n' as c union all
   select 10 as i, 'g' as c union all
   select 11 as i, 'c' as c union all
   select 12 as i, 'e' as c;

select count(distinct w.word)
from i_c c1 join
     i_c c2
     on c2.i not in (c1.i) join
     i_c c3
     on c3.i not in (c1.i, c2.i) join
     i_c c4
     on c4.i not in (c1.i, c2.i, c3.i) join
     i_c c4
     on c5.i not in (c1.i, c2.i, c3.i, c4.i) join
     words w
     on concat(c1.c, c2.c, c3.c, c4.c, c5.c) = w.word
where w.verified = 1;

Answer 2

基于Gordon上面的精彩答案，您可以创建一个临时表来存储每个char应该出现的每个char和maxCount，然后在where子句中使用NOT EXISTS子查询检查每个字母是否显示不超过maxCount。我没有安装MySQL来测试它，但我的SQL Server版本的查询工作正常，我想我已经将所有语法正确转换为MySQL。

CREATE TEMPORARY TABLE chars(letter char(1) not null, maxCount int not null);

INSERT INTO chars(letter, maxCount)
VALUES ('j',1),('x',1),('t',1),('e',2),('h',1),('m',1),('r',1),('u',1),('n',1),('g',1),('c',1) 
;

select count(distinct word)
from words w
where LENGTH(word) = 5 and word regexp '[jxtehmrungce]{5}' and verified = '1'
        and not exists(
            select 1 
            from chars c 
            --This checks how many times each character occurs in the word.
            --Ex: 'asdfee' has len = 6, if i replace the e's, then it becomes 'asdf' len = 4, 6 - 4 = 2
            where length(w.word) - length(replace(w.word, c.letter, '')) > c.maxCount 
        )
;

这是一个SQL小提琴演示：http://sqlfiddle.com/#!2/b0da4/2

您还可以检查使用GROUP_CONCAT（https://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat）以使正则表达式模式动态化。下面是一个动态示例，它基于char临时表中的值和为@targetWordLen变量设置的值。这样可以轻松地将新字符添加到列表中并更改目标字长。

动态版的SQL小提琴演示：http://sqlfiddle.com/#!2/b0da4/29

SET @targetWordLen := 5;
set @regExPattern := concat('[',(select  group_concat(letter SEPARATOR '') from chars),']{', @targetWordLen, '}');

select count(distinct w.word)
from words w
where LENGTH(word) = @targetWordLen 
  and w.word regexp @regExPattern 
  and w.verified = 1
  and not exists(
            select * 
            from chars c 
            where length(w.word) - length(replace(w.word, c.letter, '')) > c.maxCount 
        )
;

SQL - 1388字符SQL查询。（必须是一个更简单的解决方案吗？）

2 个答案:

SQL - 1388字符SQL查询。 （必须是一个更简单的解决方案吗？）

2 个答案:

SQL - 1388字符SQL查询。（必须是一个更简单的解决方案吗？）