使用Java中的Trie确定电话号码前缀

时间:2015-06-08 10:52:25

标签: java treemap prefix trie

我正在尝试创建一个快速搜索功能来确定电话号码的前缀。

我将数据库中的前缀数据作为TreeMap加载到内存中,其中key是prefix,value是包含有关此前缀(country等)的信息的对象。

这就是TreeMap的填充方式:

private static TreeMap<String, PrefixData> prefixMap = new TreeMap<>();
//EXAMPLE of DB query
    try {
        Class.forName("org.postgresql.Driver");
        Connection connection = DriverManager.getConnection(dbURL, dbUser, dbPass);
        connection.setAutoCommit(false);
        Statement stmt = connection.createStatement();

        ResultSet rs = stmt.executeQuery("SELECT * FROM countries_prefixes");

        //Looping resultset
        while (rs.next()) {
            //TODO Review fields that must be stored in memory
            String country = rs.getString("name");

            //Populating map with data object (keeping nr prefix as key and object as value)
            prefixMap.put(rs.getString("country_prefix"), new PrefixData(country));
        }

        rs.close();
        stmt.close();
        connection.close();

    } catch (ClassNotFoundException | SQLException e) {
        System.out.println(e.toString());
    }

假设我有要查看的电话号码: 37251845632; 35844021546; 34651478966 等...

有些前缀长1位,长2位,长3位,依此类推......

所以我创建了一个循环,它起作用了:

//TODO Try some other search methods (Tries)
    //array for prefix length priority
    int[] sequence = {3, 4, 2, 1, 5, 6};

    //Performing search from the map
    for (int i = 0; i < sequence.length; i++) {

        //Extracting prefix from phone nr
        String prefix = phoneNr.substring(1, sequence[i] + 1);

        if (prefixMap.containsKey(prefix)) {
            PrefixData pdata = prefixMap.get(prefix);
            System.out.println(String.format("Found matching key [%s] in TreeMap", prefix));
            System.out.println(String.format("[NAME: %s] [REGEX: %s] ", pdata.getCountryName(), pdata.getNrRegex()));

            //Validate number format with regex
            if (pdata.getNrRegex().trim() != null && !pdata.getNrRegex().trim().isEmpty()) {
                System.out.println("Regex for number validation is present!");
                if (phoneNr.matches(pdata.getNrRegex().replaceAll("^/|/$", ""))) {
                    System.out.println("NUMBER IS VALID!");
                } else {
                    System.out.println("INVALID NUMBER!");
                }
            }

            return pdata;

        }
    }
    return null;
    }

现在循环效果很好,但速度很慢。我听说过有关Tries的一些内容,速度更快,但我不明白如何在我的场景中实现这一点。

感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

正如我所说,循环有效,但这不是实现目标的好方法。

所以我做了一些研究,并提出了使用前缀树(Trie)实现的解决方案。

很少阅读Trie的内容here

现在是Trie的实施部分。我知道找到已经编写和测试过的代码会更快,所以我找到了Google实现here。和Vladimir Kroz的实施here

做了一些小修改,这就是解决方案。我将提供两种解决方案:

Prefixmap界面

package tiesImpl;

/**
 * Maps string prefixes to values. For example, if you {@code put("foo", 1)},
 * {@code get("foobar")} returns {@code 1}. Prohibits null values.
 *
 * <p>Use instead of iterating over a series of string prefixes calling
 * {@code String.startsWith(prefix)}.
 *
 * @author crazybob@google.com (Bob Lee)
 * @param <T>
 */

public interface PrefixMap<T> {
  /**
   * Maps prefix to value.
   *
     * @param prefix
     * @param value
   * @return The previous value stored for this prefix, or null if none.
   * @throws IllegalArgumentException if prefix is an empty string.
   */
  T put(CharSequence prefix, T value);

  /**
   * Finds a prefix that matches {@code s} and returns the mapped value.
   *
   * If multiple prefixes in the map match {@code s}, the longest match wins.
   *
     * @param s
   * @return value for prefix matching {@code s} or {@code null} if none match.
   */
  T get(CharSequence s);
}

PrefixTrie课程

package tiesImpl;

/*
 * Copyright (C) 2007 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.LinkedHashMap;
import java.util.Map;

/**
 * Trie implementation supporting CharSequences as prefixes.
 *
 * Prefixes are sequences of characters, and the set of allowed characters is
 * specified as a range of sequential characters. By default, any seven-bit
 * character may appear in a prefix, and so the trie is a 128-ary tree.
 *
 * @author crazybob@google.com (Bob Lee)
 * @author mharris@google.com (Matthew Harris)
 * @param <T>
 */
public class PrefixTrie<T> implements PrefixMap<T> {
  // The set of allowed characters in prefixes is given by a range of
    // consecutive characters.  rangeOffset denotes the beginning of the range,
    // and rangeSize gives the number of characters in the range, which is used as
    // the number of children of each node.

private final char rangeOffset;
private final int rangeSize;

private final Node<T> root;

/**
 * Constructs a trie for holding strings of seven-bit characters.
 */
public PrefixTrie() {
    rangeOffset = '\0';
    rangeSize = 128;
    root = new Node<>(rangeSize);
}

/**
 * Constructs a trie for holding strings of characters.
 *
 * The set of characters allowed in prefixes is given by the range
 * [rangeOffset, lastCharInRange], inclusive.
 *
 * @param firstCharInRange
 * @param lastCharInRange
 */
public PrefixTrie(char firstCharInRange, char lastCharInRange) {
    this.rangeOffset = firstCharInRange;
    this.rangeSize = lastCharInRange - firstCharInRange + 1;

    if (rangeSize <= 0) {
        throw new IllegalArgumentException("Char range must include some chars");
    }

    root = new Node<>(rangeSize);
}

/**
 * {@inheritDoc}
 *
 * @param prefix
 * @param value
 * @throws IllegalArgumentException if prefix contains a character outside
 * the range of legal prefix characters.
 */
@Override
public T put(CharSequence prefix, T value) {
    if (value == null) {
        throw new NullPointerException();
    }

    Node<T> current = root;
    for (int i = 0; i < prefix.length(); i++) {
        int nodeIndex = prefix.charAt(i) - rangeOffset;
        try {
            Node<T> next = current.next[nodeIndex];
            if (next == null) {
                next = current.next[nodeIndex] = new Node<>(rangeSize);
            }
            current = next;
        } catch (ArrayIndexOutOfBoundsException e) {
            throw new IllegalArgumentException(
                    "'" + prefix.charAt(i) + "' is not a legal prefix character.");
        }
    }
    T oldValue = current.value;
    current.value = value;
    return oldValue;
}

/**
 * {@inheritDoc}
 * @param s
 */
@Override
public T get(CharSequence s) {
    Node<T> deepestWithValue = root;
    Node<T> current = root;
    for (int i = 0; i < s.length(); i++) {
        int nodeIndex = s.charAt(i) - rangeOffset;
        if (nodeIndex < 0 || rangeSize <= nodeIndex) {
            return null;
        }
        current = current.next[nodeIndex];
        if (current == null) {
            break;
        }
        if (current.value != null) {
            deepestWithValue = current;
        }
    }
    return deepestWithValue.value;
}

/**
 * Returns a Map containing the same data as this structure.
 *
 * This implementation constructs and populates an entirely new map rather
 * than providing a map view on the trie, so this is mostly useful for
 * debugging.
 *
 * @return A Map mapping each prefix to its corresponding value.
 */
public Map<String, T> toMap() {
    Map<String, T> map = newLinkedHashMap();
    addEntries(root, new StringBuilder(), map);
    return map;
}

/**
 * Adds to the given map all entries at or below the given node.
 *
 * @param node
 * @param builder A StringBuilder containing the prefix for the given node.
 * @param map
 */
private void addEntries(Node<T> node,
        StringBuilder builder,
        Map<String, T> map) {
    if (node.value != null) {
        map.put(builder.toString(), node.value);
    }

    for (int i = 0; i < node.next.length; i++) {
        Node<T> next = node.next[i];
        if (next != null) {
            builder.append((char) (i + rangeOffset));
            addEntries(next, builder, map);
            builder.deleteCharAt(builder.length() - 1);
        }
    }
}

private static class Node<T> {

    T value;
    final Node<T>[] next;

    @SuppressWarnings("unchecked")
    Node(int numChildren) {
        next = new Node[numChildren];
    }
}

/**
 * Creates a {@code LinkedHashMap} instance.
 *
 * @param <K>
 * @param <V>
 * @return a newly-created, initially-empty {@code LinkedHashMap}
 */
public static <K, V> LinkedHashMap<K, V> newLinkedHashMap() {
    return new LinkedHashMap<>();
}

}

Vladimir Kroz实施:Trie class

package tiesImpl;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

/**
 * Prefix table based on Trie structure. Allows to perform incremental lookup
 * and match based on search key prefixes (classic example - determine phone
 * area code for given phone number)
 *
 * @param <V> a type of value object to be stored along with prefix (e.g when
 * key is a country name, the value could be a name of the country)
 *
 * @author Vladimir Kroz
 * https://vkroz.wordpress.com/2012/03/23/prefix-table-trie-implementation-in-java/
 */

public class Trie<V> {

Entry<V> entry;
char key;
Map<Character, Trie<V>> children;

public Trie() {
    this.children = new HashMap<>(10);
    entry = new Entry<>();
}

/**
 * non-public, used by _put()
 */
Trie(char key) {
    this.children = new HashMap<>(10);
    this.key = key;
    entry = new Entry<>();
}

public void put(String key, V value) {
    _put(new StringBuffer(key), new StringBuffer(""), value);
}

void _put(StringBuffer remainder, StringBuffer prefix, V value) {
    if (remainder.length() > 0) {
        char keyElement = remainder.charAt(0);
        Trie<V> t = null;
        try {
            t = children.get(keyElement);
        } catch (IndexOutOfBoundsException e) {
        }
        if (t == null) {
            t = new Trie<>(keyElement);
            children.put(keyElement, t);
        }
        prefix.append(remainder.charAt(0));
        t._put(remainder.deleteCharAt(0), prefix, value);
    } else {
        this.entry.value = value;
        this.entry.prefix = prefix.toString();
    }

}

/**
 * Retrieves element from prefix table matching as a prefix to provided
 * key. E.g. if key is "37251656565" and prefix table has node "372" then
 * this call will return the value of "372"
 *
 * @param key a string which starts with prefix to be searched in the table
 * (e.g. phone number)
 * @return an Object associated with matching prefix (i.e if key is a phone
 * number it may return a corresponding country name)
 */
public V get(String key) {
    return _get(new StringBuffer(key), 0);
}

/**
 * Returns true if key has matching prefix in the table
 *
 * @param key
 * @return
 */
public boolean hasPrefix(String key) {
    return this.get(key) != null;
}

V _get(StringBuffer key, int level) {
    if (key.length() > 0) {
        Trie<V> t = children.get(key.charAt(0));
        if (t != null) {
            //FYI: modified code to return deepest with value instead of returning null if prefix doesn't have corresponding value.
            V result = t._get(key.deleteCharAt(0), ++level);
            return result == null ? entry.value : result;

        } else {
            return (level > 0) ? entry.value : null;
        }

    } else {
        return entry.value;
    }
}

@Override
//For debugging
public String toString() {

    Iterator<Character> it = children.keySet().iterator();
    StringBuffer childs = new StringBuffer();

    while (it.hasNext()) {
        Character _key = it.next();
        childs.append(String.format("\n%s\n",
                //Adding a tab to the beginning of every line to create a visual tree
                String.format("%s: %s", _key, children.get(_key)).replaceAll("(?m)(^)", "\t")));
    }

    return String.format("Trie [entry=%s, children=%s]", entry, childs);
}

static public class Entry<V> {

    String prefix;
    V value;

    public Entry() {
    }

    public Entry(String p, V v) {
        prefix = p;
        value = v;
    }

    public String prefix() {
        return prefix;
    }

    public V value() {
        return value;
    }

    @Override
    public String toString() {
        return "Entry [prefix=" + prefix + ", value=" + value + "]";
    }

}
}

最后是测试部分

package tiesImpl;

/**
 * Class for testing different implementations of prefix tree (Trie).
 *
 * @author lkallas
 */

public class TriesTest {

private static final PrefixTrie<String> googleTrie = new PrefixTrie<>();
private static final Trie<String> krozTrie = new Trie<>();

public static void main(String[] args) {

    //Inserting prefixes to Google implementation of Trie
    googleTrie.put("7", "Russia");
    googleTrie.put("77", "Abkhazia");
    googleTrie.put("746", "Some unknown country");

    //Inserting prefixes to Vladimir Kroz implementation of Trie
    krozTrie.put("7", "Russia");
    krozTrie.put("77", "Abkhazia");
    krozTrie.put("746", "Some unknown country");

    System.out.println("Google version of get: " + googleTrie.get("745878787"));
    System.out.println("Vladimir Kroz version of get: " + krozTrie.get("745878787"));

}

}

希望这个答案对其他人有所帮助! 干杯!