汇集问题:项目借了不止一次

时间:2012-01-19 22:05:31

标签: java multithreading

我有一个实用方法(= static一个),我打了很多,使用了java.util.regex.Matcher。由于传递的正则表达式被重用了很多,我每次都尝试不编译它,所以我把它保存在Map中,其中键是正则表达式,值是List {{1}对象(以便每个线程获得它自己的Matcher实例)。

以下代码段如何设法两次返回相同的Matcher ...有时候?

Matcher

你会得到一个“import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Queue; import java.util.Set; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.regex.Matcher; import java.util.regex.Pattern; public class MyTest { private static final Map<String, Queue<Matcher>> matchers = new HashMap<String, Queue<Matcher>>(); private static Set<Integer> duplicateHunter = new HashSet<Integer>(); private static Matcher getMatcher(String regexp, String value) { Queue<Matcher> matcherQueue = matchers.get(regexp); if (matcherQueue == null) { synchronized (matchers) { matcherQueue = matchers.get(regexp); if (matcherQueue == null) { // Create a new thread-safe Queue and a new Matcher matcherQueue = new ConcurrentLinkedQueue<Matcher>(); matchers.put(regexp, matcherQueue); } // Else: another thread already did what needed to be done } } // Try to retrieve a Matcher Matcher matcher = matcherQueue.poll(); if (matcher == null) { // No matchers available, create one // No lock needed, as it's not a drama to have a few more matchers in the queue Pattern pattern = Pattern.compile(regexp); matcher = pattern.matcher(value); matcherQueue.offer(matcher); } else { // reset the matcher matcher.reset(value); } // boolean notADuplicate = duplicateHunter.add(matcher.hashCode()); // if(!notADuplicate) { // throw new RuntimeException("DUPLICATE!!!"); // } return matcher; } private static void returnMatcher(String regex, Matcher matcher) { Queue<Matcher> matcherQueue = matchers.get(regex); //duplicateHunter.remove(matcher.hashCode()); matcherQueue.offer(matcher); } public static void main(String[] args) { for (int i = 0; i < 2; i++) { Thread thread = new Thread(new Runnable() { public void run() { for (int i = 0; i < 50000; i++) { String regex = ".*"; Matcher matcher = null; try { matcher = getMatcher(regex, "toto" + i); if (matcher.matches()) { // matches } } finally { if (matcher != null) { returnMatcher(regex, matcher); } } } } }); thread.start(); } } } :字符串索引超出范围”。启用java.lang.StringIndexOutOfBoundsException代码,您会看到duplicateHunter确实有时会返回两次。

(未显示Matcher实用程序方法,static方法用于向您显示问题)

2 个答案:

答案 0 :(得分:4)

如果没有正则表达式的匹配器,则创建一个新的匹配器,但您也可以立即将其添加到队列中:

if (matcher == null) {
    // No matchers available, create one
    // No lock needed, as it's not a drama to have a few more matchers in the queue
    Pattern pattern = Pattern.compile(regexp);
    matcher = pattern.matcher(value);
    matcherQueue.offer(matcher); // Don't add it to the queue here!
}

因此,当你使用它时它将在队列中,而另一个线程可以在你完成之前轻松掌握它。

我不知道我是否同意你的方法聚集匹配器的想法。就CPU周期而言,创建它们并不是非常昂贵。您可能想要对其进行分析以确定它是否值得。但是,预编译Pattern是一个好主意。

答案 1 :(得分:1)

当你创建一个新的Matcher时,你会在返回之前将它提供给Queue,所以下一个线程会立即得到它。

matcher = pattern.matcher(value);  
matcherQueue.offer(matcher);        // <-- this line should be taken taken out and shot

...

return matcher;

此外,您的duplicateHunter HashSet不是线程安全的,并且在验证时可能会给您错误的结果。