Question

如何匹配任何重复n次的字符？

示例：

for input: abcdbcdcdd
for n=1:   ..........
for n=2:    .........
for n=3:     .. .....
for n=4:      .  . ..
for n=5:   no matches

经过几个小时，我最好的是这个表达

(\w)(?=(?:.*\1){n-1,}) //where n is variable

使用lookahead。但是这个表达式的问题是：

for input: abcdbcdcdd
for n=1    .......... 
for n=2     ... .. .
for n=3      ..  .
for n=4       .
for n=5    no matches

正如您所看到的，当前瞻符合某个角色时，让我们看一下 for n=4 行，d前瞻断言得到满足并且首先d与正则表达式匹配。但剩余的d未匹配，因为他们之前还有3 d个。{/ p>

我希望我能清楚地说出这个问题。希望您的解决方案，提前感谢。

Answer 1

正则表达式（和有限自动机）无法计入任意整数。它们只能计入预定义的整数，幸运的是这是你的情况。

如果我们首先构建一个非确定性有限自动机（NFA）广告然后将其转换为正则表达式，那么解决这个问题要容易得多。

所以下面的自动机为n = 2，输入字母= {a，b，c，d}

将匹配任何char重复2次的字符串。如果没有字符有2次重复（所有字符显示少于或多于两次）字符串将不匹配。

将其转换为正则表达式应该看起来像

"^([^a]*a[^a]*a[^a]*)|([^b]*b[^b]*b[^b]*)|([^b]*c[^c]*c[^C]*)|([^d]*d[^d]*d[^d]*)$"

如果输入字母很大，这可能会出现问题，因此应该以某种方式缩短正则表达式，但我现在无法想到它。

Answer 2

让我们寻找n = 4行，d的前瞻断言得到满足   并且首先由正则表达式匹配。   但剩下的d不匹配，因为他们不再有3个d   在他们之前。


显然，没有正则表达式，这是一个非常简单的字符串操作     问题。我试图用正则表达式进行此操作。

与任何正则表达式实现一样，答案取决于正则表达式的风格。您可以使用.net正则表达式引擎创建解决方案，因为它允许可变宽度的外观。

另外，我将在下面为perl兼容/类似正则表达式提供更通用的解决方案。

.net解决方案

作为@PetSerAl pointed out in his answer，使用可变宽度的lookbehinds，我们可以断言回字符串的开始，并检查是否有 n 次出现。
ideone demo

Python中的正则表达式模块
您可以使用python在regex module by Matthew Barnett中实现此解决方案，这也允许使用可变宽度的外观。

>>> import regex
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){2})\A.*)', 'abcdbcdcdd')
['b', 'c', 'd', 'b', 'c', 'd', 'c', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){3})\A.*)', 'abcdbcdcdd')
['c', 'd', 'c', 'd', 'c', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){4})\A.*)', 'abcdbcdcdd')
['d', 'd', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){5})\A.*)', 'abcdbcdcdd')
[]

广义解决方案

在pcre或任何＆＃34; perl-like＆＃34;味道，没有解决方案实际上会为每个重复的字符返回匹配，但我们可以为每个字符创建一个，只有一个捕获。

策略

对于任何给定的 n ，逻辑涉及：

早期匹配：匹配并捕获每个字符，然后至少 n 更多出现。
最终捕获：
- 匹配并捕获一个字符，然后完全 n-1 出现，
- 还会捕获以下每一个事件。

实施例

for n = 3
input = abcdbcdcdd

角色c仅 M 只有一次（作为最终），并且以下两次出现也是 C 在同一场比赛中出现：

abcdbcdcdd
  M  C C

并且字符d是（早期） M 一次：

abcdbcdcdd
   M

和（最后） M 再次出现， C 适应其余时间：

abcdbcdcdd
      M CC

正则表达式

/(\w)                        # match 1 character
(?:
    (?=(?:.*?\1){≪N≫})     # [1] followed by other ≪N≫ occurrences
  |                          #   OR
    (?=                      # [2] followed by:
        (?:(?!\1).)*(\1)     #      2nd occurence <captured>
        (?:(?!\1).)*(\1)     #      3rd occurence <captured>
        ≪repeat previous≫  #      repeat subpattern (n-1) times
                             #     *exactly (n-1) times*
        (?!.*?\1)            #     not followed by another occurence
    )
)/xg

n =

/(\w)(?:(?=(?:.*?\1){2})|(?=(?:(?!\1).)*(\1)(?!.*?\1)))/g
demo
/(\w)(?:(?=(?:.*?\1){3})|(?=(?:(?!\1).)*(\1)(?:(?!\1).)*(\1)(?!.*?\1)))/g
demo
/(\w)(?:(?=(?:.*?\1){4})|(?=(?:(?!\1).)*(\1)(?:(?!\1).)*(\1)(?:(?!\1).)*(\1)(?!.*?\1)))/g
demo
......等。

伪代码生成模式

// Variables: N (int)

character = "(\w)"
early_match = "(?=(?:.*?\1){" + N + "})"

final_match = "(?="
for i = 1; i < N; i++
    final_match += "(?:(?!\1).)*(\1)"
final_match += "(?!.*?\1))"

pattern = character + "(?:" + early_match + "|" + final_match + ")"

JavaScript代码

我将使用javascript显示实现，因为我们可以在此检查结果（如果它在javascript中有效，它可以在任何与perl兼容的正则表达式中运行，包括.net，{ {3}}，java，python，ruby以及实施perl等所有语言。）

＆＃13;

var str = 'abcdbcdcdd';
var pattern, re, match, N, i;
var output = "";

// We'll show the results for N = 2, 3 and 4
for (N = 2; N <= 4; N++) {
    // Generate pattern
    pattern = "(\\w)(?:(?=(?:.*?\\1){" + N + "})|(?=";
    for (i = 1; i < N; i++) {
        pattern += "(?:(?!\\1).)*(\\1)";
    }
    pattern += "(?!.*?\\1)))";
    re = new RegExp(pattern, "g");
    
    output += "<h3>N = " + N + "</h3><pre>Pattern: " + pattern + "\nText: " + str;
    
    // Loop all matches
    while ((match = re.exec(str)) !== null) {
        output += "\nPos: " + match.index + "\tMatch:";
        // Loop all captures
        x = 1;
        while (match[x] != null) {
            output += " " + match[x];
            x++;
        }
    }
    
    output += "</pre>";
}

document.write(output);

＆＃13;

Python3代码

根据OP的要求，我要链接到 pcre

Answer 3

使用.NET正则表达式，您可以执行以下操作：

(\w)(?<=(?=(?:.*\1){n})^.*) where n is variable

其中：

(\w) - 在第一组中捕获的任何角色。
(?<=^.*) - lookbehind断言，它将我们返回到字符串的开头。
(?=(?:.*\1){n}) - 超前断言，查看字符串是否包含该字符的n个实例。

Demo

Answer 4

我不会为此使用正则表达式。我会使用像python这样的脚本语言。试试这个python函数：

alpha = 'abcdefghijklmnopqrstuvwxyz'
def get_matched_chars(n, s):
    s = s.lower()
    return [char for char in alpha if s.count(char) == n]

该函数将返回一个字符列表，所有字符都出现在字符串s中n次。请记住，我只在字母表中包含字母。您可以更改alpha以表示您想要匹配的任何内容。

REGEX - 匹配任何重复n次的字符

4 个答案: