我想找到像
这样的char序列是否可以通过正则表达式查找?
我试过这个
\\b\\w*?(\\w{2})\\w*?\\1\\w*?\\b on <b>'tatarak'</b>
这发现 ta 在单词中它应该只在 ta 三次或更多时找到
答案 0 :(得分:0)
尝试在同一Pattern
内使用群组和反向引用。
String[] namesWithRepeatedOneLetter = { "RAAAJ", "ABBBAAS" };
String[] namesWithRepeatedTwoLetters = { "ABABABC", "FOOBCBCD"};
// | This is a posix character class, basically your a-zA-Z
// | range. Note the parenthesis which define it as a group.
// | | This is a reference to previously declared
// | | group (as group 1)
// | | | Greedy quantifier for more than 2
// | | | letter repeat
Pattern p0 = Pattern.compile("(\\p{Alpha})\\1{2,}");
// | Greedy quantifier for 2+ repeats (so
// | repetition is considered as such with 2
// | letter groups
Pattern p1 = Pattern.compile("(\\p{Alpha}{2,})\\1{2,}");
for (String n : namesWithRepeatedOneLetter) {
Matcher m = p0.matcher(n);
while (m.find()) {
System.out.println(m.group());
}
}
System.out.println();
for (String n: namesWithRepeatedTwoLetters) {
Matcher m = p1.matcher(n);
while (m.find()) {
System.out.println(m.group());
}
}
<强>输出强>
AAA
BBB
ABABAB
评论后修改
要引用印地语字符,请使用Unicode块或脚本引用,而不是类或Posix类。
例如:
Pattern p0 = Pattern.compile("(\\p{IsDevanagari})\\1{2,}");
最后,在反向引用之后编辑量词(贪婪+
,现在贪婪{2,}
),这样只匹配三次重复。
答案 1 :(得分:0)
这个怎么样?对于tatarak loremipsrecdks RAAAJ , ABBBAAS
,输出为
tata
AAA
BBB
AA
代码
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DublicatePattern {
public static void main(String[] args) {
String value = "tatarak loremipsrecdks RAAAJ , ABBBAAS";
Pattern p = Pattern.compile("(\\w+)\\1+");
Matcher m = p.matcher(value);
while (m.find()) {
System.out.println("Found: " + value.substring(m.start(), m.end()));
}
}
}