Question

我想找到像

这样的char序列

AAA BBB或名称中的ZZZ（例如RAAAJ，ABBBAAS）
名称中的ABABAB或CPCPCP

是否可以通过正则表达式查找？

我试过这个

\\b\\w*?(\\w{2})\\w*?\\1\\w*?\\b on <b>'tatarak'</b>

这发现 ta 在单词中它应该只在 ta 三次或更多时找到

Answer 1

尝试在同一Pattern内使用群组和反向引用。

String[] namesWithRepeatedOneLetter = { "RAAAJ", "ABBBAAS" };
String[] namesWithRepeatedTwoLetters = { "ABABABC", "FOOBCBCD"};
//                            | This is a posix character class, basically your a-zA-Z 
//                            | range. Note the parenthesis which define it as a group.
//                            |           | This is a reference to previously declared
//                            |           | group (as group 1)
//                            |           |  | Greedy quantifier for more than 2 
//                            |           |  | letter repeat
Pattern p0 = Pattern.compile("(\\p{Alpha})\\1{2,}");
//                                       | Greedy quantifier for 2+ repeats (so 
//                                       | repetition is considered as such with 2 
//                                       | letter groups
Pattern p1 = Pattern.compile("(\\p{Alpha}{2,})\\1{2,}");
for (String n : namesWithRepeatedOneLetter) {
    Matcher m = p0.matcher(n);
    while (m.find()) {
        System.out.println(m.group());
    }
}
System.out.println();
for (String n: namesWithRepeatedTwoLetters) {
    Matcher m = p1.matcher(n);
    while (m.find()) {
        System.out.println(m.group());
    }
}

<强>输出

AAA
BBB

ABABAB

评论后修改

要引用印地语字符，请使用Unicode块或脚本引用，而不是类或Posix类。

例如：

Pattern p0 = Pattern.compile("(\\p{IsDevanagari})\\1{2,}");

最后，在反向引用之后编辑量词（贪婪+，现在贪婪{2,}），这样只匹配三次重复。

Answer 2

这个怎么样？对于tatarak loremipsrecdks RAAAJ , ABBBAAS，输出为

tata
AAA
BBB
AA

代码

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DublicatePattern {


    public static void main(String[] args) {
        String value = "tatarak loremipsrecdks RAAAJ , ABBBAAS";
        Pattern p = Pattern.compile("(\\w+)\\1+");
        Matcher m = p.matcher(value);
        while (m.find()) {
            System.out.println("Found: " + value.substring(m.start(), m.end()));
        }
    }
}

正则表达式找到继续模式超过两次或更多

2 个答案: