我正在构建一个单词unscrambler(php / mysql),它接受2到8个字母之间的用户输入,并返回2到8个字母之间可以从这些字母中生成的单词,不一定使用所有字母,但是绝对不包括比提供的更多的字母。
用户将输入类似MSIKE或MSIKEI(两个i)的内容,或字母的任意组合或多次出现的字母。
下面的查询将查找包含M,S,I,K或E的所有单词。
但是,下面的查询还会返回多次出现未请求的字母的单词。例如,即使用户没有输入两个e,用户也没有输入两个e,或者单词kiss,即使用户没有输入s两次,也会返回单词meek。
SELECT word
FROM words
WHERE word REGEXP '[msike]'
AND has_a=0
AND has_b=0
AND has_c=0
AND has_d=0
(we skip e) or we could add has_e=1
AND has_f=0
...and so on...skipping letters m, s, i, k, and e
AND has_w=0
AND has_x=0
AND has_y=0
AND has_z=0
请注意,如果字母出现在单词中,则列has_a,has_b等为1,如果不是,则为0。
我对表架构的任何更改都是开放的。
这个网站:http://grecni.com/texttwist.php是我试图模仿的一个很好的例子。
问题是如何修改查询以不返回多次出现字母的单词,除非用户专门多次输入字母。按字长分组将是一个额外的好处。
非常感谢。
编辑:我根据@awei的建议更改了数据库,has_ {letter}现在是count_ {letter}并存储相应字母中相应字母的出现总次数。当用户多次输入字母时,这可能很有用。例如:用户输入MSIKES(两个)。
此外,我已经放弃了原始SQL语句中所示的REGEXP方法。努力完成PHP方面的大部分工作,但仍然存在许多障碍。
编辑:包含表格中的前10行
id word alpha otcwl ospd csw sowpods dictionary enable vowels consonants start_with end_with end_with_ing end_with_ly end_with_xy count_a count_b count_c count_d count_e count_f count_g count_h count_i count_j count_k count_l count_m count_n count_o count_p count_q count_r count_s count_t count_u count_v count_w count_x count_y count_z q_no_u letter_count scrabble_points wwf_points status date_added
1 aa aa 1 0 0 1 1 1 aa a a 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 1 2015-11-12 05:39:45
2 aah aah 1 0 0 1 0 1 aa h a h 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 6 5 1 2015-11-12 05:39:45
3 aahed aadeh 1 0 0 1 0 1 aae hd a d 0 0 0 2 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 8 1 2015-11-12 05:39:45
4 aahing aaghin 1 0 0 1 0 1 aai hng a g 1 0 0 2 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 6 10 11 1 2015-11-12 05:39:45
5 aahs aahs 1 0 0 1 0 1 aa hs a s 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 4 7 6 1 2015-11-12 05:39:45
6 aal aal 1 0 0 1 0 1 aa l a l 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 4 1 2015-11-12 05:39:45
7 aalii aaiil 1 0 0 1 1 1 aaii l a i 0 0 0 2 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 6 1 2015-11-12 05:39:45
8 aaliis aaiils 1 0 0 1 0 1 aaii ls a s 0 0 0 2 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 6 6 7 1 2015-11-12 05:39:45
9 aals aals 1 0 0 1 0 1 aa ls a s 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 4 4 5 1 2015-11-12 05:39:45
10 aardvark aaadkrrv 1 0 0 1 1 1 aaa rdvrk a k 0 0 0 3 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 8 16 17 1 2015-11-12 05:39:45
答案 0 :(得分:10)
认为您已经使用修改后的架构完成了艰苦的工作。您现在需要做的就是修改查询以查找<=
用户指定的每个字母的计数。
E.g。如果用户输入&#34; ALIAS&#34;:
SELECT word
FROM words
WHERE count_a <= 2
AND count_b <= 0
AND count_c <= 0
AND count_d <= 0
AND count_e <= 0
AND count_f <= 0
AND count_g <= 0
AND count_h <= 0
AND count_i <= 1
AND count_j <= 0
AND count_k <= 0
AND count_l <= 1
AND count_m <= 0
AND count_n <= 0
AND count_o <= 0
AND count_p <= 0
AND count_q <= 0
AND count_r <= 0
AND count_s <= 1
AND count_t <= 0
AND count_u <= 0
AND count_v <= 0
AND count_w <= 0
AND count_x <= 0
AND count_y <= 0
AND count_z <= 0
ORDER BY CHAR_LENGTH(word), word;
注意:根据要求,这是按字长排序,然后按字母顺序排序。甚至为<=
使用<= 0
只是为了更容易手动修改其他字母。
这将返回&#34; aa&#34;,&#34; aal&#34;和#34; aals&#34; (但不&#34; aalii&#34;或&#34; aaliis&#34;因为他们都有两个&#34; i&#34; s。
请参阅SQL Fiddle Demo。
答案 1 :(得分:3)
由于您有两个不同的要求,我建议实施两种不同的解决方案。
如果您不关心重复字母,请使用26个字母构建SET
数据类型。根据单词的含义填充位。这会忽略重复的字母。这也有助于查找包含字母子集的单词:(the_set & ~the_letters) = 0
。
如果您关心重复,请对单词中的字母进行排序并将其存储为密钥。 &#34; msike&#34;成为&#34; eikms&#34;。
构建一个包含3列的表:
eikms -- non unique index on this
msike -- the real word - probably good to have this as the PRIMARY KEY
SET('m','s','i',','k','e') -- for the other situation.
msikei和meek将作为
输入eikms
msikei
SET('m','s','i',','k','e') -- (or, if more convenient: SET('m','i','s','i',','k','e')
ekm
meek
SET('e','k','m')
REGEXP
不适用于您的任务。
修改1
我认为你还需要一个列来指示单词中是否有任何加倍的字母。这样,您就可以区分kiss
允许msikes
,msike
允许{。}}。
修改2
SET
或INT UNSIGNED
可以为26个字母中的每一个保留1位 - 0表示不存在,1表示现在。
msikes
和msike
都会进入设置,只需打开5位。对于INSERT
,'m,s,i,k,e,s'
的值为msikes
。由于其余部分需要涉及布尔算术,因此使用INT UNSIGNED
可能更好。所以......
a is 1 (1 << 0)
b is 2 (1 << 1)
c is 4 (1 << 2)
d is 8 (1 << 3)
...
z is (1 << 25)
要INSERT
使用|
运算符。 bad
成为
(1 << 1) | (1 << 0) | (1 << 3)
注意这些位是如何布局的,&#39; a&#39;在底部:
SELECT BIN((1 << 1) | (1 << 0) | (1 << 3)); ==> 1011
同样&#39; ad&#39;是1001.那么,&#39; ad&#39;匹配&#39;坏&#39;?答案来自
SELECT b'1001' & ~b'1011' = 0; ==> 1 (meaning 'true')
这意味着&#39; ad&#39;中的所有字母(1001)发现在“不好”的情况下(1011)。让我们介绍&#34; bed&#34;,即11010。
SELECT b'11010' & ~b'1011' = 0; ==> FALSE because of 'e' (10000)
但是爸爸&#39; (1001)将正常工作:
SELECT b'1001' & ~b'1011' = 0; ==> TRUE
所以,现在来了&#34; dup&#34;旗。因为爸爸&#39;有重复的字母,但“坏”&#39;没有,你的规则说它不匹配。但它需要&#34; dup&#34;完成决定。
如果您还没有布尔算术课程,那么我刚刚介绍了前几章。如果我把它覆盖得太快,就找一本关于这样的数学书并跳进去。&#34;它不是火箭科学。&#34;
那么,回到需要什么代码来决定my_word是否包含字母子集以及是否允许重复字母:
SELECT $my_mask & ~tbl.mask = 0, dup FROM tbl;
然后在两者之间进行合适的AND / OR以完成逻辑。
答案 2 :(得分:1)
由于对MySQL的Regex支持有限,我能做的最好的是用于生成查询的PHP脚本,假设它只包含英文字母。似乎表达式排除无效单词比包含它们的单词更容易。
<?php
$inputword = str_split('msikes');
$counter = array();
for ($l = 'a'; $l < 'z'; $l++) {
$counter[$l] = 0;
}
foreach ($inputword as $l) {
$counter[$l]++;
}
$nots = '';
foreach ($counter as $l => $c) {
if (!$c) {
$nots .= $l;
unset($counter[$l]);
}
}
$conditions = array();
if(!empty($nots)) {
// exclude words that have letters not given
$conditions[] = "[" . $nots . "]'";
}
foreach ($counter as $l => $c) {
$letters = array();
for ($i = 0; $i <= $c; $i++) {
$letters[] = $l;
}
// exclude words that have the current letter more times than given
$conditions[] = implode('.*', $letters);
}
$sql = "SELECT word FROM words WHERE word NOT RLIKE '" . implode('|', $conditions) . "'";
echo $sql;
答案 3 :(得分:1)
这样的事可能适合你:
// Input Word
$WORD = strtolower('msikes');
// Alpha Array
$Alpha = range('a', 'z');
// Turn it into letters.
$Splited = str_split($WORD);
$Letters = array();
// Count occurrence of each letter, use letter as key to make it unique
foreach( $Splited as $Letter ) {
$Letters[$Letter] = array_key_exists($Letter, $Letters) ? $Letters[$Letter] + 1 : 1;
}
// Build a list of letters that shouldn't be present in the word
$ShouldNotExists = array_filter($Alpha, function ($Letter) use ($Letters) {
return ! array_key_exists($Letter, $Letters);
});
#### Building SQL Statement
// Letters to skip
$SkipLetters = array();
foreach( $ShouldNotExists as $SkipLetter ) {
$SkipLetters[] = "`has_{$SkipLetter}` = 0";
}
// count condition (for multiple occurrences)
$CountLetters = array();
foreach( $Letters as $K => $V ) {
$CountLetters[] = "`count_{$K}` <= {$V}";
}
$SQL = 'SELECT `word` FROM `words` WHERE '.PHP_EOL;
$SQL .= '('.implode(' AND ', $SkipLetters).')'.PHP_EOL;
$SQL .= ' AND ('.implode(' AND ', $CountLetters).')'.PHP_EOL;
$SQL .= ' ORDER BY LENGTH(`word`), `word`'.PHP_EOL;
echo $SQL;