我需要用java.regex解析单引号之间的字符串。该字符串可能包含oracle样式的转义单引号(以转义'
只放''
)。
例如:对于qwerty 'uiop asdfg''hjklzxcvb'
,它必须返回'uiop asdfg''hjklzxcvb'
我有一个适用于小字符串的代码,但如果我尝试解析相当大的字符串,我会得到java.lang.StackOverflowError
。如何重写我的模式,以便它可以用大字符串工作?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
static String STRING_BETWEEN_QUOTES_PATTERN = "'(?:[^']|'')*'";
static String queryString = "qwerty 'uiop asdfg''hjklzxcvb' ";
public static void main(String[] args) {
Pattern patternBetweenQuotes = Pattern.compile(STRING_BETWEEN_QUOTES_PATTERN);
Matcher matcherBetweenQuotes = patternBetweenQuotes.matcher(queryString);
while (matcherBetweenQuotes.find()) {
System.out.println(matcherBetweenQuotes.group());
}
}
}
此代码不适用于此类字符串:
static String queryString =" qwerty' uiop'' asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm QWERTY QWERTY uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty UIO p asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty uiop asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwertyqwerty UI op asdfghjklzxcvb nmqw ertyuiopasdfghjklzxcvb nmqwertyuiopasd fghjklzxcvbnm qwerty' &#34 ;;
答案 0 :(得分:2)
我建议使用这个更快的正则表达式:
final static String STRING_BETWEEN_QUOTES_PATTERN = "'(?:[^']+|'')*'";
只需15个步骤,查看完成匹配的演示链接。而'(?:[^']|'')*'
takes whopping 4185 steps。
答案 1 :(得分:1)
你需要"解开"它:
"'[^']*(?:''[^']*)*'"
请参阅regex demo
你的正则表达式失败并且字符串更长的原因是交替需要大量的回溯步骤,解决方案是使正则表达式更加“线性”#34;
这是正则表达式分解:
'
- 撇号[^']*
- 除'
(?:''[^']*)*
- 0个或更多组......
''
- 两个文字'
[^']*
- 除'
'
- 文字'
使用我的正则表达式,您的样本输入在11个步骤中匹配,'(?:''|[^'])*'
在6271步之后完成。