用于匹配正则表达式列表之一的高效数据结构

时间:2019-04-01 12:50:59

标签: java regex search data-structures

我上了MyRegex课程

public class MyRegex {
    private Pattern pattern;
    private Map<String, Object> attributes;
    public MyRegex() {
        this.pattern = null;
        this.attributes = new HashMap<String, Object>();
    }
    public MyRegex(String regex) {
        this.pattern = Pattern.compile(regex);
        this.attributes = new HashMap<String, Object>();
    }
    public MyRegex withRegex(String regex) {
        this.pattern = Pattern.compile(regex);
        return this;
    }
    public void putAttribute(String attrName, Object attrValue) {
        attributes.put(attrName, attrValue);
    }
    public Map<String, Object> getAttributes() {
        return Collections.unmodifiableMap(attributes);
    }
    public Pattern getPattern() {
        return pattern;
    }
    public Matcher matcher(CharSequence str) {
        return pattern.matcher(str);
    }
}

我有这个课:

public class MyRegexMeta {
    private String metaFileName;

    private List<MyRegex> attrModelList;

    public MyRegexMeta(String metaFileName) {
        this.metaFileName = metaFileName;
        loadConfig();
        startConfigLoader();
    }

    private void startConfigLoader() {
        new Thread(new ConfigLoader()).start();
        log.info("Config loader started");
    }

    private class ConfigLoader implements Runnable {
        public void run() {
            final int sleepSeconds = 7200;
            // load config periodically, so that the it's up to date.
            while (true) {
                try {
                    Thread.sleep(sleepSeconds * 1000);
                } catch (InterruptedException e) {
                    log.error("Failed to make loading thread to sleep. ", e);
                    continue;
                }
                loadConfig();
            }
        }
    }

    private void loadConfig() {
        // Read configuration from a file, and store in a list.
    }

    public void generateAttributes(String name, Map<String, Object> attrs) {
        List<MyRegex> tempModelList = attrModelList;
        Map<String, MyRegex> matched = new HashMap<String, MyRegex>();
        for (MyRegex model : tempModelList) {
            Matcher matcher = model.matcher(name);
            if (matcher.matches()) {
                for (String attrName : model.getAttributes().keySet()) {
                    matched.put(attrName, model);
                }
                attrs.putAll(model.getAttributes());
                continue;
            }
        }
    }
}

此代码是正在运行的服务的一部分。我在loadConfig中读取的文本文件的每一行都有一个正则表达式和一组属性。我调用generateAttributes传入一个名称,然后调用一个Map来保存与匹配我名字的正则表达式相对应的属性。

我每秒调用几次generateAttributes,所以它的速度非常重要。另外,我的名字分布不均。假设我的attrModelList中有200条正则表达式行。也许其中有20行将匹配我传入的名称的70%。我知道顺序搜索匹配的正则表达式将为O(n),但我希望平均大小写比O(n)好。

在一般情况下,如何使此代码更高效?

0 个答案:

没有答案