Question

这是一个填字游戏问题。例如：

解决方案是一个6个字母的单词，以“r”开头，以“r”
因此模式是“r .... r”
未知的4个字母必须从字母“a”，“e”，“i”和“p”
每个字母必须使用一次
我们有大量候选6个字母的单词

解决方案：“剑杆”或“修复”。

过滤模式“r .... r”是微不足道的，但找到在“未知”插槽中也有[aeip]的单词超出了我的范围。

这个问题是否适用于正则表达式，还是必须通过详尽的方法来完成？

Answer 1

试试这个：

r(?:(?!\1)a()|(?!\2)e()|(?!\3)i()|(?!\4)p()){4}r

...或更可读：

r
(?:
  (?!\1) a () |
  (?!\2) e () |
  (?!\3) i () |
  (?!\4) p ()
){4}
r

空组用作复选标记，在消耗时勾选每个字母。例如，如果要匹配的单词是repair，则e将是此构造匹配的第一个字母。如果正则表达式稍后尝试匹配另一个e，则该替代方法将与之匹配。否定前瞻(?!\2)将失败，因为组＃2参与了比赛，并且不介意它没有消耗任何东西。

真正酷的是它在包含重复字母的字符串上也能正常工作。拿你的redeem示例：

r
(?:
  (?!\1) e () |
  (?!\2) e () |
  (?!\3) e () |
  (?!\4) d ()
){4}
m

在第一个e被消耗之后，第一个替代方案被有效禁用，因此第二个替代方案取而代之。等等...

不幸的是，这种技术并不适用于所有正则表达式。首先，它们并不是所有处理空/失败的组捕获都是一样的。 ECMAScript规范明确规定对非参与组的引用应始终成功。

正则表达式风格还必须支持前向引用 - 即在正则表达式中引用的组之前出现的反向引用。（ref）它应该在.NET，Java，Perl，PCRE和Ruby中工作，我知道。

Answer 2

假设你的意思是未知字母必须在[aeip]中，那么一个合适的正则表达式可能是：

/r[aeip]{4,4}r/

Answer 3

用于比较字符串的前端语言是什么，是java，.net ......

这是一个使用java

的示例/伪代码

    String mandateLetters = "aeio"
    String regPattern = "\\br["+mandateLetters+"]*r$";  // or if for specific length \\br[+mandateLetters+]{4}r$

    Pattern pattern = Pattern.compile(regPattern);
    Matcher matcher = pattern.matcher("is this repair ");

    matcher.find();

Answer 4

为什么不更换每个'。'使用'[aeip]'的原始模式？

你最终会得到一个正则表达式字符串r[aeip][aeip][aeip][aeip]r。

这当然可以缩短为r[aeip]{4,4}r，但在一般情况下实施可能会很痛苦，并且可能不会改进代码。

这并未解决重复使用字母的问题。如果我编写代码，我会在regexp之外的代码中处理它 - 主要是因为regexp会比我想要处理的更复杂。

Answer 5

所以“只有一次”部分是至关重要的。列出所有排列显然是不可行的。如果您的语言/环境支持前瞻和后向引用，您可以使自己更容易：

r(?=[aeip]{4,4})(.)(?!\1)(.)(?!\1|\2)(.)(?!\1|\2|\3).r

仍然非常难看，但这是如何运作的：

r     # match an r
(?=   # positive lookahead (doesn't advance position of "cursor" in input string)
  [aeip]{4,4}
)     # make sure that there are the four desired character ahead
(.)   # match any character and capture it in group 1
(?!\1)# make sure that the next character is NOT the same as the previous one
(.)   # match any character and capture it in group 2
(?!\1|\2)
      # make sure that the next character is neither the first nor the second
(.)   # match any character and capture it in group 3
(?!\1|\2|\3)
      # same thing again for all three characters
.     # match another arbitrary character
r     # match an r

Working demo.

这既不优雅也不可扩展。因此，您可能只想使用r([aiep]{4,4})r（捕获四个关键字母）并确保没有正则表达式的附加条件。

编辑：事实上，如果您只想确保有4个不相同的字符，上述模式才真正有用且必要。对于您的特定情况，再次使用前瞻，有更简单（尽管更长）的解决方案：

r(?=[^a]*a[^a]*r)(?=[^e]*e[^e]*r)(?=[^i]*i[^i]*r)(?=[^p]*p[^p]*r)[aeip]{4,4}r

说明：

r       # match an r
(?=     # lookahead: ensure that there is exactly one a until the next r
  [^a]* # match an arbitrary amount of non-a characters
  a     # match one a
  [^a]* # match an arbitrary amount of non-a characters
  r     # match the final r
)       # end of lookahead
(?=[^e]*e[^e]*r)  # ensure that there is exactly one e until the next r
(?=[^i]*i[^i]*r)  # ensure that there is exactly one i until the next r
(?=[^p]*p[^p]*r)  # ensure that there is exactly one p until the next r
[aeip]{4,4}r      # actually match the rest to include it in the result

Working demo.

对于r....m池deee，可以将其调整为：

r(?=[^d]*d[^d]*m)(?=[^e]*(?:e[^e])*{3,3}m)[de]{4,4}m

这可以确保只有一个d和正好3个e。

Working demo.

Answer 6

由于sed多正则表达式操作而没有完全正则表达式

sed -n -e '/^r[aiep]\{4,\}r$/{/\([aiep]\).*\1/!p;}' YourFile

将aeip组合中的模式4字母放在r，仅保留在子组中没有找到两次字母的行。

Answer 7

更具伸缩性的解决方案（无需为每个字母或位置编写\ 1，\ 2，\ 3等）是使用否定前瞻来断言每个字符以后都没有出现：

<div class="menu-item">
  <div class="menu-title">
    <div id="number">23</div>
    <div id="dish">Souvlaki</div>
    <div id="price">495 kr</div>
  </div>
  <div class="menu-ingredients">
    <div id="ingredients">Pizza, hamburger, cucumber, tomato</div>
  </div>
</div>
<div class="menu-item">
  <div class="menu-title">
    <div id="number">40</div>
    <div id="dish">RAVIOLI CON PIPIENO DI GRANICHI E RICOTTA AL FINOCCHIO E SALSINA DI ARRAGOSTA

</div>
    <div id="price">9000 kr</div>
  </div>
  <div class="menu-ingredients">
    <div id="ingredients">Ravioli filled with crab and ricotta. Servec with minicucumber and cheese from Gotland. Shrimps toghether with peanuts and pumpkin</div>
  </div>
</div>

更具可读性：

^r(?:([aeip])(?!.*\1)){4}r$

改进

这是一个快速的解决方案，适用于您提供给我们的情况，但这里有一些额外的限制要求有一个robuster版本：

如果＆＃34;字母池＆＃34;可以与字符串的末尾共享一些字母，包括前瞻中的模式结尾：
```
^r
(?:
  ([aeip])
  (?!.*\1)
){4}
r$
```
（可能无法在所有正则表达式中使用，在这种情况下，复制粘贴模式的结尾而不是使用^r(?:([aeip])(?!.*\1.*\2)){4}(r$)）
如果某些字母不仅必须存在一次，而且必须存在不同的固定次数，请为共享此次数的所有字母添加单独的预测。例如，＆＃34; r .... r＆＃34;用一个＆＃34; a＆＃34;和一个＆＃34; p＆＃34;但是两个＆＃34; e＆＃34;将与此正则表达式相匹配（但＆＃34;说唱歌手＆＃34;和＆＃34;重复＆＃34;不会）：
```
\2
```
非捕获组现在有两种选择：^r(?:([ap])(?!.*\1.*\3)|([e])(?!.*\2.*\2.*\3)){4}(r$)匹配＆＃34; a＆＃34;或＆＃34; p＆＃34;没有跟随任何地方，直到另一个结束，([ap])(?!.*\1.*\3)匹配＆＃34; e＆＃34;没有跟随任何地方，直到其他2个结束（所以如果共有3个则在第一个上失败）。顺便说一下，这个解决方案包括上面的解决方案，但是模式的结尾在这里转移到了([e])(?!.*\2.*\2.*\3)（同样，请参阅关于味道的说明）。

填字游戏解决方案的正则表达式

7 个答案:

改进