Question

我正在构建一个单词unscrambler（php / mysql），它接受2到8个字母之间的用户输入，并返回2到8个字母之间可以从这些字母中生成的单词，不一定使用所有字母，但是绝对不包括比提供的更多的字母。

用户将输入类似MSIKE或MSIKEI（两个i）的内容，或字母的任意组合或多次出现的字母。

下面的查询将查找包含M，S，I，K或E的所有单词。

但是，下面的查询还会返回多次出现未请求的字母的单词。例如，即使用户没有输入两个e，用户也没有输入两个e，或者单词kiss，即使用户没有输入s两次，也会返回单词meek。

SELECT word
FROM words
WHERE word REGEXP '[msike]'
AND has_a=0
AND has_b=0
AND has_c=0
AND has_d=0
(we skip e) or we could add has_e=1
AND has_f=0
...and so on...skipping letters  m, s, i, k, and e
AND has_w=0
AND has_x=0
AND has_y=0
AND has_z=0

请注意，如果字母出现在单词中，则列has_a，has_b等为1，如果不是，则为0。

我对表架构的任何更改都是开放的。

这个网站：http://grecni.com/texttwist.php是我试图模仿的一个很好的例子。

问题是如何修改查询以不返回多次出现字母的单词，除非用户专门多次输入字母。按字长分组将是一个额外的好处。

非常感谢。

编辑：我根据@awei的建议更改了数据库，has_ {letter}现在是count_ {letter}并存储相应字母中相应字母的出现总次数。当用户多次输入字母时，这可能很有用。例如：用户输入MSIKES（两个）。

此外，我已经放弃了原始SQL语句中所示的REGEXP方法。努力完成PHP方面的大部分工作，但仍然存在许多障碍。

编辑：包含表格中的前10行

id  word        alpha       otcwl   ospd    csw sowpods dictionary  enable  vowels  consonants  start_with  end_with    end_with_ing    end_with_ly end_with_xy count_a count_b count_c count_d count_e count_f count_g count_h count_i count_j count_k count_l count_m count_n count_o count_p count_q count_r count_s count_t count_u count_v count_w count_x count_y count_z q_no_u  letter_count    scrabble_points wwf_points  status  date_added  
1   aa          aa          1       0       0   1       1           1       aa                  a           a           0               0           0           2       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       2               2               2           1       2015-11-12 05:39:45
2   aah         aah         1       0       0   1       0           1       aa      h           a           h           0               0           0           2       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       3               6               5           1       2015-11-12 05:39:45
3   aahed       aadeh       1       0       0   1       0           1       aae     hd          a           d           0               0           0           2       0       0       1       1       0       0       1       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       5               9               8           1       2015-11-12 05:39:45
4   aahing      aaghin      1       0       0   1       0           1       aai     hng         a           g           1               0           0           2       0       0       0       0       0       1       1       1       0       0       0       0       1       0       0       0       0       0       0       0       0       0       0       0       0       0       6               10              11          1       2015-11-12 05:39:45
5   aahs        aahs        1       0       0   1       0           1       aa      hs          a           s           0               0           0           2       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       4               7               6           1       2015-11-12 05:39:45
6   aal         aal         1       0       0   1       0           1       aa      l           a           l           0               0           0           2       0       0       0       0       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       3               3               4           1       2015-11-12 05:39:45
7   aalii       aaiil       1       0       0   1       1           1       aaii    l           a           i           0               0           0           2       0       0       0       0       0       0       0       2       0       0       1       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       5               5               6           1       2015-11-12 05:39:45
8   aaliis      aaiils      1       0       0   1       0           1       aaii    ls          a           s           0               0           0           2       0       0       0       0       0       0       0       2       0       0       1       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       6               6               7           1       2015-11-12 05:39:45
9   aals        aals        1       0       0   1       0           1       aa      ls          a           s           0               0           0           2       0       0       0       0       0       0       0       0       0       0       1       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       4               4               5           1       2015-11-12 05:39:45
10  aardvark    aaadkrrv    1       0       0   1       1           1       aaa     rdvrk       a           k           0               0           0           3       0       0       1       0       0       0       0       0       0       1       0       0       0       0       0       0       2       0       0       0       1       0       0       0       0       0       8               16              17          1       2015-11-12 05:39:45

Answer 1

认为您已经使用修改后的架构完成了艰苦的工作。您现在需要做的就是修改查询以查找<=用户指定的每个字母的计数。

E.g。如果用户输入＆＃34; ALIAS＆＃34;：

SELECT word
FROM words
WHERE count_a <= 2
  AND count_b <= 0
  AND count_c <= 0
  AND count_d <= 0
  AND count_e <= 0
  AND count_f <= 0
  AND count_g <= 0
  AND count_h <= 0
  AND count_i <= 1
  AND count_j <= 0
  AND count_k <= 0
  AND count_l <= 1
  AND count_m <= 0
  AND count_n <= 0
  AND count_o <= 0
  AND count_p <= 0
  AND count_q <= 0
  AND count_r <= 0
  AND count_s <= 1
  AND count_t <= 0
  AND count_u <= 0
  AND count_v <= 0
  AND count_w <= 0
  AND count_x <= 0
  AND count_y <= 0
  AND count_z <= 0
ORDER BY CHAR_LENGTH(word), word;

注意：根据要求，这是按字长排序，然后按字母顺序排序。甚至为<=使用<= 0只是为了更容易手动修改其他字母。

这将返回＆＃34; aa＆＃34;，＆＃34; aal＆＃34;和＃34; aals＆＃34; （但不＆＃34; aalii＆＃34;或＆＃34; aaliis＆＃34;因为他们都有两个＆＃34; i＆＃34; s。

请参阅SQL Fiddle Demo。

Answer 2

由于您有两个不同的要求，我建议实施两种不同的解决方案。

如果您不关心重复字母，请使用26个字母构建SET数据类型。根据单词的含义填充位。这会忽略重复的字母。这也有助于查找包含字母子集的单词：(the_set & ~the_letters) = 0。

如果您关心重复，请对单词中的字母进行排序并将其存储为密钥。＆＃34; msike＆＃34;成为＆＃34; eikms＆＃34;。

构建一个包含3列的表：

eikms -- non unique index on this
msike -- the real word - probably good to have this as the PRIMARY KEY
SET('m','s','i',','k','e') -- for the other situation.

msikei和meek将作为

输入

eikms
msikei 
SET('m','s','i',','k','e') -- (or, if more convenient: SET('m','i','s','i',','k','e')

ekm
meek
SET('e','k','m')

REGEXP 不适用于您的任务。

修改1

我认为你还需要一个列来指示单词中是否有任何加倍的字母。这样，您就可以区分kiss允许msikes，msike允许{。}}。

修改2

SET或INT UNSIGNED可以为26个字母中的每一个保留1位 - 0表示不存在，1表示现在。

msikes和msike都会进入设置，只需打开5位。对于INSERT，'m,s,i,k,e,s'的值为msikes。由于其余部分需要涉及布尔算术，因此使用INT UNSIGNED可能更好。所以......

a is 1 (1 << 0)
b is 2 (1 << 1)
c is 4 (1 << 2)
d is 8 (1 << 3)
...
z is (1 << 25)

要INSERT使用|运算符。 bad成为

(1 << 1) | (1 << 0) | (1 << 3)

注意这些位是如何布局的，＆＃39; a＆＃39;在底部：

SELECT BIN((1 << 1) | (1 << 0) | (1 << 3)); ==> 1011

同样＆＃39; ad＆＃39;是1001.那么，＆＃39; ad＆＃39;匹配＆＃39;坏＆＃39;？答案来自

SELECT b'1001' & ~b'1011' = 0; ==> 1 (meaning 'true')

这意味着＆＃39; ad＆＃39;中的所有字母（1001）发现在“不好”的情况下（1011）。让我们介绍＆＃34; bed＆＃34;，即11010。

SELECT b'11010' & ~b'1011' = 0; ==> FALSE because of 'e' (10000)

但是爸爸＆＃39; （1001）将正常工作：

SELECT b'1001' & ~b'1011' = 0; ==> TRUE

所以，现在来了＆＃34; dup＆＃34;旗。因为爸爸＆＃39;有重复的字母，但“坏”＆＃39;没有，你的规则说它不匹配。但它需要＆＃34; dup＆＃34;完成决定。

如果您还没有布尔算术课程，那么我刚刚介绍了前几章。如果我把它覆盖得太快，就找一本关于这样的数学书并跳进去。＆＃34;它不是火箭科学。＆＃34;

那么，回到需要什么代码来决定my_word是否包含字母子集以及是否允许重复字母：

SELECT $my_mask & ~tbl.mask = 0, dup FROM tbl;

然后在两者之间进行合适的AND / OR以完成逻辑。

Answer 3

由于对MySQL的Regex支持有限，我能做的最好的是用于生成查询的PHP脚本，假设它只包含英文字母。似乎表达式排除无效单词比包含它们的单词更容易。

<?php
$inputword = str_split('msikes');
$counter = array();
for ($l = 'a'; $l < 'z'; $l++) {
    $counter[$l] = 0;
}
foreach ($inputword as $l) {
    $counter[$l]++;
}
$nots = '';
foreach ($counter as $l => $c) {
    if (!$c) {
        $nots .= $l;
        unset($counter[$l]);
    }
}
$conditions = array();
if(!empty($nots)) {
    // exclude words that have letters not given
    $conditions[] = "[" . $nots . "]'";
}
foreach ($counter as $l => $c) {
    $letters = array();
    for ($i = 0; $i <= $c; $i++) {
        $letters[] = $l;
    }
    // exclude words that have the current letter more times than given
    $conditions[] = implode('.*', $letters); 
}
$sql = "SELECT word FROM words WHERE word NOT RLIKE '" . implode('|', $conditions) . "'";
echo $sql;

Answer 4

这样的事可能适合你：

// Input Word
$WORD = strtolower('msikes');

// Alpha Array
$Alpha = range('a', 'z');

// Turn it into letters.
$Splited    = str_split($WORD);
$Letters    = array();
// Count occurrence of each letter, use letter as key to make it unique
foreach( $Splited as $Letter ) {
    $Letters[$Letter] = array_key_exists($Letter, $Letters) ? $Letters[$Letter] + 1 : 1;
}

// Build a list of letters that shouldn't be present in the word
$ShouldNotExists = array_filter($Alpha, function ($Letter) use ($Letters) {
    return ! array_key_exists($Letter, $Letters);
});

#### Building SQL Statement
// Letters to skip
$SkipLetters = array();
foreach( $ShouldNotExists as $SkipLetter ) {
    $SkipLetters[] = "`has_{$SkipLetter}` = 0";
}
// count condition (for multiple occurrences)
$CountLetters = array();
foreach( $Letters as $K => $V ) {
    $CountLetters[] = "`count_{$K}` <= {$V}";
}

$SQL = 'SELECT `word` FROM `words` WHERE '.PHP_EOL;
$SQL .= '('.implode(' AND ', $SkipLetters).')'.PHP_EOL;
$SQL .= ' AND ('.implode(' AND ', $CountLetters).')'.PHP_EOL;
$SQL .= ' ORDER BY LENGTH(`word`), `word`'.PHP_EOL;

echo $SQL;

如何使我的单词解码器返回更相关的结果

4 个答案: