在拼字游戏字典中搜索带有空白拼贴(“”)字符的单词

时间:2013-05-11 20:25:02

标签: java regex string hashset

假设我有一个名为String的{​​{1}}代表Scrabble广告牌上的移动:s

我有"AARDV RK"名为HashSet<String>,其中包含整个拼字游戏词典(约180,000字!)。

我如何使用正则表达式搜索dict dict,但空白字符代表任何大写字母?

3 个答案:

答案 0 :(得分:0)

这样的事情:

^AARDV[A-Z]RK$

答案 1 :(得分:0)

由于dictionaly除了大写字母之外不包含任何其他内容,因此最简单的选项将是最佳选择:

final Pattern p = Pattern.compile("AARDV.RK")`
for (String entry : dict)
  if (p.matcher(entry).matches()) return entry;
return null;

通配符.将匹配该位置的任何字符,这将为您节省对此字符进行任何冗余检查的轻微惩罚。另请注意,预先编译正则表达式非常重要,不要为每个条目重新编译它。

答案 2 :(得分:0)

描述

您可以只查找适合玩家当前托盘的所有单词,而不是寻找随机字母。

考虑以下关于正则表达式和逻辑的powershell示例。在这里,我使用逻辑来构建基于玩家当前具有的瓦片的正则表达式。由此产生的正则表达式具有两个不同的部分,匹配1和播放器托盘中每个字母的总数,第二部分匹配播放器托盘中每个字母的0和+1之间。因此,播放器在其托盘中有2个A,正则表达式将尝试匹配0到3 A之间,同时仍需要1到每个字母的总数所有其他信件。此过程针对每个字母进行迭代。

所以举个例子吧。如果玩家的托盘中有aarvfrk,则正则表达式会查找包含所有字母的所有字词,例如aardvark,但aardvarks也会匹配,但我们稍后会通过过滤匹配来消除基于字长where {$_.Length -le $($PlayerTiles.Length + 1)}。因此,一个单词不能在玩家托盘中存在超过2个额外的图块。

接下来,我构建一个正则表达式,用于查找找到的单词中的字母,这些字母是玩家目前在托盘中没有的。

当然存在这种特殊逻辑可能失败的边缘情况,例如如果玩家缺少两个拼写单词的字母。在特定的电路板布局可能包含您正在寻找的字母的情况下,了解这些字词可能会有所帮助。您可以通过评估电路板布局来解决这个问题,并在电路板上包含所有单个字母,就好像它们是播放器托盘的一部分一样。此评估需要足够智能,以识别由于电路板布局而无法使用的字母。它也可以变得足够聪明,以便从电路板布局中识别出合法使用的多个字符串。但所有这些都超出了原始问题的范围。

注释

根据您选择的语言,您可能需要使用*?之类的内容替换任何外观中的{0,100}。这是由于语言[喜欢java ]如何实现它的外观,其中搜索字符串可能是一个未确定的大小。

源代码

    $Matches = @()
    [array]$Dictionary = @()

    $Dictionary += 'AARDVARK'
    $Dictionary += 'AARDVRKS'
    $Dictionary += 'AARDVARKS'
    $Dictionary += 'ANTHILL'
    $Dictionary += 'JUMPING'
    $Dictionary += 'HILLSIDE'
    $Dictionary += 'KITTENS'
    $Dictionary += 'LOVER'
    $Dictionary += 'LOVE'
    $Dictionary += 'LOVES'
    $Dictionary += 'LOVELY'
    $Dictionary += 'OLIVE'
    $Dictionary += 'VOTE'


    $PlayerTiles = "aardvrk"

Function funBuildRegexForPlayerTiles ([string]$GivenTiles) {

    # split the GivenTiles so each letter is seperate, and store these in a hashtable so the letter is the keyname and the number times it's seen is the value, This deduplicates each letter
    [hashtable]$SearchForTiles = @{}
    foreach ($Letter in $GivenTiles[0..$($GivenTiles.Length - 1)] ) {
        $SearchForTiles[$Letter] += 1
        } # next letter

    # build regex for tiles to match just the tiles we have 
    [string]$SameNumberRegex = ""
    foreach ($Letter in $SearchForTiles.Keys) {
        $SameNumberRegex += "(?=^([^$Letter]*?$Letter){1,$($SearchForTiles[$Letter])}(?![^$Letter]*?$Letter))"
        } # next letter


    # add to the regex to include one extra letter of each type. 
    [array]$ExtraLetterRegex = @()
    foreach ($MasterLetter in $SearchForTiles.Keys) {
        [string]$TempRegex = ""
        foreach ($Letter in $SearchForTiles.Keys) {
            if ($MasterLetter -ieq $Letter) {
                # this forces each letter to allow zero to one extra of itself in the dictionary string. This allows us to match words which would have all the other letters and none of this letter
                $TempRegex += "(?=^([^$Letter]*?$Letter){0,$($SearchForTiles[$Letter] + 1)}(?![^$Letter]*?$Letter))"

                } else {
                # All the rest of these tiles on this regex section will need to have just the number of tiles the player has
                $TempRegex += "(?=^([^$Letter]*?$Letter){1,$($SearchForTiles[$Letter])}(?![^$Letter]*?$Letter))"
                } # end if
            } # next letter
        $ExtraLetterRegex += $TempRegex

        Write-Host "To match an extra '$MasterLetter': " $TempRegex
        } # next MasterLetter

    # put it all together
    [array]$AllRegexs = @()
    $AllRegexs += $SameNumberRegex
    $AllRegexs += $ExtraLetterRegex


    # stitch all the regexs together to make a massive regex 
    [string]$Output = $AllRegexs -join "|"

    return $Output
    } # end function funBuildRegexForPlayerTiles        


Function funBuildMissingLetterRegex ([string]$GivenTiles) {
    # split the GivenTiles so each letter is seperate, and store these in a hashtable so the letter is the keyname and the number times it's seen is the value, This deduplicates each letter
    [hashtable]$SearchForTiles = @{}
    foreach ($Letter in $GivenTiles[0..$($GivenTiles.Length - 1)] ) {
        $SearchForTiles[$Letter] += 1
        } # next letter

    [array]$MissingLetterRegex = @()
    # include any letters which do not match the current tiles
    $MissingLetterRegex += "(?i)([^$($SearchForTiles.Keys -join '')])"

    # build the regex to find the missing tiles
    foreach ($Letter in $SearchForTiles.Keys) {
        $MissingLetterRegex += "(?i)(?<=($Letter[^$Letter]*?){$($SearchForTiles[$Letter])})($Letter)"
        } # next letter

    [string]$Output = $MissingLetterRegex -join "|"
    return $Output
    } # end function


    [string]$Regex = funBuildRegexForPlayerTiles -GivenTiles $PlayerTiles
    Write-Host "Player tiles '$PlayerTiles'"
    Write-Host "Regex = '$Regex'"
    Write-Host "Matching words = "  
    $MatchedWords = $Dictionary -imatch $Regex | where {$_.Length -le $($PlayerTiles.Length + 1)}

    [string]$MissingLetterRegex = funBuildMissingLetterRegex $PlayerTiles
    foreach ($Word in $MatchedWords) {
        Write-Host $Word -NoNewline
        # find all the letters for which the player doesn't have a matching tile
        [array]$MissingTiles = ([regex]"$MissingLetterRegex").matches($Word) | foreach {
            Write-Output $_.Groups[0].Value
            } # next match
        Write-Host "`tLetters you are missing to spell this work '$($MissingTiles -join '')'"
        } # next word

    Write-Host -------------------------------

    $PlayerTiles = "OLLVE"
    [hashtable]$SearchForTiles = @{}

    # build regex for tiles
    [string]$Regex = funBuildRegexForPlayerTiles -GivenTiles $PlayerTiles


    Write-Host "Player tiles '$PlayerTiles'"
    Write-Host "Regex = '$Regex'"
    Write-Host
    Write-Host "Matching words = "  
    $MatchedWords = $Dictionary -imatch $Regex | where {$_.Length -le $($PlayerTiles.Length + 1)}

    [string]$MissingLetterRegex = funBuildMissingLetterRegex $PlayerTiles
    foreach ($Word in $MatchedWords) {
        Write-Host $Word -NoNewline
        # find all the letters for which the player doesn't have a matching tile
        [array]$MissingTiles = ([regex]"$MissingLetterRegex").matches($Word) | foreach {
            Write-Output $_.Groups[0].Value
            } # next match
        Write-Host "`tLetters you are missing to spell this work '$($MissingTiles -join '')'"
        } # next word

产量

To match an extra 'r':  (?=^([^r]*?r){0,3}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))
To match an extra 'v':  (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){0,2}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))
To match an extra 'a':  (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){0,3}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))
To match an extra 'k':  (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){0,2}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))
To match an extra 'd':  (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){0,2}(?![^d]*?d))
Player tiles 'aardvrk'
Regex = '(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){0,3}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){0,2}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){0,3}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){0,2}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){0,2}(?![^d]*?d))'
Matching words = 
AARDVARK    Letters you are missing to spell this work 'A'
AARDVRKS    Letters you are missing to spell this work 'S'
-------------------------------
To match an extra 'O':  (?=^([^O]*?O){0,2}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))
To match an extra 'E':  (?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){0,2}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))
To match an extra 'L':  (?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){0,3}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))
To match an extra 'V':  (?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){0,2}(?![^V]*?V))
Player tiles 'OLLVE'
Regex = '(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){0,2}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){0,2}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){0,3}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){0,2}(?![^V]*?V))'

Matching words = 
LOVER   Letters you are missing to spell this work 'R'
LOVE    Letters you are missing to spell this work ''
LOVES   Letters you are missing to spell this work 'S'
LOVELY  Letters you are missing to spell this work 'Y'
OLIVE   Letters you are missing to spell this work 'I'
VOTE    Letters you are missing to spell this work 'T'

摘要

我们正在寻找匹配单词的第一部分,我使用正则表达式组成这些块:(?=^([^$Letter]*?$Letter){1,$($SearchForTiles[$Letter])}(?![^$Letter]*?$Letter))。所有这些块都由|或语句分隔。

  • (?=开始零宽度断言
    • ^匹配字符串的开头
    • (创建一组必需的字符序列
    • [^$Letter]*?匹配任何字符,但我们正在寻找零次或多次非贪婪的字母
    • $Letter匹配字母
    • )关闭小组
  • {迫使小组发生
    • 1至少一次
    • ,
    • $($SearchForTiles[$Letter])最多为玩家拥有的此图块的总数
    • }结束数量检查
  • (?!使用环顾四周防止任何
    • [^$Letter]*?任意数量的不是此字母的字符
    • $Letter后面跟着这封信
    • )环顾四周
  • )这个零宽度断言的结尾基本上是寻找这封信的结尾

使用(?i)([^$($SearchForTiles.Keys -join '')])搜索单词中缺少的字母时,后跟每个字母(?i)(?<=($Letter[^$Letter]*?){$($SearchForTiles[$Letter])})($Letter)的这些块。所有这些夹头都由|或陈述

分隔
  • (?i)强制执行不敏感的案例
    • (开始小组检查
    • [^不包括这些字符
    • $($SearchForTiles.Keys -join '')获取播放器磁贴集中的每个重复数字的字母并将它们连接在一起
    • ]字符集的结尾
    • )基本上返回不在播放器托盘中的所有字母
  • |或声明
  • 对于播放器托盘中的每个字母,
  • 后跟一个这样的组
  • (?i)强制案例不敏感
  • (?<=启动lookbehind
  • (启动必须按此顺序排列的字符组
    • $Letter寻找这封信
    • [^$Letter]后跟任何不是此字母的字符
    • *?零次或多次
    • )关闭必须按此顺序排列的字符组
    • {$($SearchForTiles[$Letter])}在我们开始匹配丢失的字母之前,该组必须至少存在一次播放器的每个磁贴
    • )关闭了lookbehind
  • ($Letter)匹配我们正在寻找的字母,如果匹配,则播放器在托盘中遗漏了这封信,这样就会返回此信件。