Question

我正在编写一个Java程序，我需要在其中搜索Set中的特定单词。必须被搜索的词就像（“wo.d”），其中'。'可以用任何其他字母替换。我正在使用正则表达式匹配这种类型的单词案例。

这是我到目前为止所拥有的

HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
    Pattern p = Pattern.compile(word);
    Matcher m;
    boolean match = false;
    for(String setWord : words){
        m = p.matcher(setWord);
        if(m.matches())
            match = true;
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else{
    System.out.println("The word does not contain regex do other stuff");
}

上面的代码有效，但效率不高，因为它在一秒钟内被多次调用。因此它会导致程序滞后。

Answer 1

您需要在获得匹配后立即停止迭代，因此假设您使用Java 8，您的for循环可以在下一次有效重写：

boolean match = words.stream().anyMatch(w -> p.matcher(w).matches());

您还可以使用parallelStream()代替stream()并行化研究，尤其是Set有很多单词时。

如果您不使用Java 7，仍然可以使用FluentIterable中的Google Guava来完成，但遗憾的是无法并行化研究。

boolean match = FluentIterable.from(words).anyMatch(
    new Predicate<String>() {
        @Override
        public boolean apply(@Nullable final String w) {
            return p.matcher(w).matches();
        }
    }
);

但在你的情况下，我认为使用FluentIterable比仅仅在获得匹配时添加break更有趣，因为它仍然更容易阅读和维护< / p>

if (p.matcher(setWord).matches()) {
    match = true;
    break;
}

所以，如果你真的需要使用正则表达式并且你不能使用Java 8，那么你最好的选择是如上所述使用break，没有魔法要考虑的技巧。

假设您只有一个要替换的字符，可以使用startsWith(String)和endsWith(String)来完成，它总是比正则表达式快得多即可。像这样：

// Your words should be in a TreeSet to be already sorted alphabetically 
// in order to get a match as fast as possible
Set<String> words = new TreeSet<String>(); //this set is already populated
int index = word.indexOf('.');
if (index != -1) {
    String prefix = word.substring(0, index);
    String suffix = word.substring(index + 1);
    boolean match = false;
    for (String setWord : words){
        // From the fastest to the slowest thing to check 
        // to get the best possible performances
        if (setWord.length() == word.length() 
            && setWord.startsWith(prefix) 
            && setWord.endsWith(suffix)) {
            match = true;
            break;
        }
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else {
    System.out.println("The word does not contain regex do other stuff");
}

Answer 2

使用TreeSet而不是HashSet。并测试该组的子范围。

TreeSet<String> words = new TreeSet<>();// this set is already populated
String word = "t.st";
if (word.contains(".")) {
    String from = word.replaceFirst("\\..*", "");
    String to = from + '\uffff';
    Pattern p = Pattern.compile(word);
    Matcher m;
    boolean match = false;
    for (String setWord : words.subSet(from, to)) {
        m = p.matcher(setWord);
        if (m.matches()) {
            match = true;
            break;
        }
    }
    if (match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
} else {
    System.out.println("The word does not contain regex do other stuff");
}

在这种情况下，words.subSet(from, to)仅包含以“t”开头的单词。

Answer 3

在您获得匹配后，只需突破循环即可停止HashSet的进一步正则表达式匹配：

if(m.matches()) {
   match = true;
   break;
}

完整代码：

HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
    Pattern p = Pattern.compile(word);
    Matcher m;
    boolean match = false;
    for(String setWord : words){
        m = p.matcher(setWord);
        if(m.matches()) {
            match = true;
            break:
        }
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else{
    System.out.println("The word does not contain regex do other stuff");
}

Answer 4

使用这样的原始匹配方法。

static boolean match(String wild, String s) {
    int len = wild.length();
    if (len != s.length())
        return false;
    for (int i = 0; i < len; ++i) {
        char w = wild.charAt(i);
        if (w == '.')
            continue;
        else if (w != s.charAt(i))
            return false;
    }
    return true;
}

和

HashSet<String> words = new HashSet<>();// this set is already populated
String word = "t.st";
boolean match = false;
if (word.contains(".")) {
    for (String setWord : words) {
        if (match(word, setWord)) {
            match = true;
            break;
        }
    }
    if (match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
} else {
    System.out.println("The word does not contain regex do other stuff");
}

如何在HashSet中搜索时使用正则表达式

4 个答案: