我想允许两个主要通配符?
和*
过滤我的数据。
以下是我现在正在做的事情(正如我在很多网站上看到的那样):
public boolean contains(String data, String filter) {
if(data == null || data.isEmpty()) {
return false;
}
String regex = filter.replace(".", "[.]")
.replace("?", ".")
.replace("*", ".*");
return Pattern.matches(regex, data);
}
但是我们不应该逃避所有其他正则表达式特殊字符,例如|
或(
等等吗?而且,如果它们前面有?
,我们可以保留*
和\
吗?例如,像:
filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\\\$1") // 1. escape regex special chars, but ?, * and \
.replaceAll("([^\\\\]|^)\\?", "$1.") // 2. replace any ? that isn't preceded by a \ by .
.replaceAll("([^\\\\]|^)\\*", "$1.*") // 3. replace any * that isn't preceded by a \ by .*
.replaceAll("\\\\([^?*]|$)", "\\\\\\\\$1"); // 4. replace any \ that isn't followed by a ? or a * (possibly due to step 2 and 3) by \\
你怎么看?如果您同意,我是否缺少任何其他正则表达式特殊字符?
编辑#1 (考虑到dan1111和m.buettner的建议后):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars, but \, ? and *
regex = regex.replaceAll("([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
这个怎么样?
编辑#2 (考虑到dan1111的建议后):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars (if not already escaped by user), but \, ? and *
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
目标即将到来?
答案 0 :(得分:2)
替换字符串中不需要4个反斜杠来写出一个反斜杠。两个反斜杠就足够了。
您可以使用否定的lookbehind来避免替换字符串中的([^\\\\]|^)
和$1
:
filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\$1") // 1. escape regex special chars, but ?, * and \
.replaceAll("(?<!\\\\)[?]", ".") // 2. replace any ? that isn't preceded by a \ by .
.replaceAll("(?<!\\\\)[*]", ".*") // 3. replace any * that isn't preceded by a \ by .*
我真的没有看到你需要的最后一步。不会逃脱逃避元字符的反斜杠(反过来,实际上不会逃避它们)。我忽略了这样一个事实,你的替换呼叫会写出4个反斜杠而不是只有两个。但是说你的原始输入有th|is
。然后,您的第一次替换将成为th\|is
。然后,最后一次替换会使th\\|is
匹配th
- 反斜杠或 is
。
您需要区分字符串在代码中的编写方式(未编译,反斜杠的两倍)以及编译后的内容(仅包含一半反斜杠)。
您可能还想考虑限制可能*
的数量。像.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*!
这样的正则表达式(在输入中找不到!
)可能需要很长时间才能运行。该问题称为catastrophic backtracking。
答案 1 :(得分:0)
最后我采用的解决方案(使用Apache Commons Lang库):
public static boolean isFiltered(String data, String filter) {
// no filter: return true
if (StringUtils.isBlank(filter)) {
return true;
}
// a filter but no data: return false
else if (StringUtils.isBlank(data)) {
return false;
}
// a filter and a data:
else {
// case insensitive
data = data.toLowerCase();
filter = filter.toLowerCase();
// .matches() auto-anchors, so add [*] (i.e. "containing")
String regex = "*" + filter + "*";
// replace any pair of backslashes by [*]
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// minimize unescaped redundant wildcards
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape unescaped regexps special chars, but [\], [?] and [*]
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace unescaped [?] by [.]
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace unescaped [*] by [.*]
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
// return whether data matches regex or not
return data.matches(regex);
}
}
非常感谢@ dan1111和@ m.buettner的宝贵帮助;)
答案 2 :(得分:0)
试试这个更简单的版本:
String regex = Pattern.quote(filter).replace("*", "\\E.*\\Q").replace("?", "\\E.\\Q");
引用整个过滤器\Q
和\E
,然后停止*
和?
上的引用,将其替换为等效的模式({{1} }和.*
)
我用
测试了它.
输出:
String simplePattern = "ab*g\\Ei\\.lmn?p";
String data = "abcdefg\\Ei\\.lmnop";
String quotedPattern = Pattern.quote(simplePattern);
System.out.println(quotedPattern);
String regex = quotedPattern.replace("*", "\\E.*\\Q").replace("?", "\\E.\\Q");
System.out.println(regex);
System.out.println(data.matches(regex));
请注意,这是基于Oracle的\Qab*g\E\\E\Qi\.lmn?p\E
\Qab\E.*\Qg\E\\E\Qi\.lmn\E.\Qp\E
true
实现,我不知道是否还有其他有效的实现。