哈希图如何提供​​恒定时间的性能?

时间:2018-07-08 07:31:57

标签: java algorithm collections hashmap

这似乎是一个问题,已经问了一百万遍了。 但是我很疑惑了很长时间,还没有找到合适的答案。

假设我有一个包含1100个元素的哈希图。我假设地图上有1000个水桶。

因此,当我插入一个新元素时,它首先派生密钥的哈希值,例如其676,现在它将检查676桶的位置,并将该对象作为EntryObject放入桶中。

现在我的问题是它如何进入676桶? 我假设这些存储桶哈希已索引,我的意思是有序。 就像我有一本1000页的书,并且我想转到676页一样,我无法直接打开该页面,基于书的宽度,并且可以打开到接近676页的页面再尝试几次,我可以转到第676页。 这本书是100页还是1000000页,与1:10000并没有多大区别,但是在到达确切页面之前,我必须进行几次尝试。

我的问题是,它在HashMap中如何发生? 另外,如果您当中有人给我一些引导,以深入了解内部工作原理,那将非常有帮助。

谢谢

2 个答案:

答案 0 :(得分:3)

这是一个数组查找。解决someArray [index]时,您不会翻阅页面,而是将一个元素的大小乘以索引后的数字添加到第一个条目的地址,就可以了。

答案 1 :(得分:-2)

如果在IDE中正确配置了JRE / JDK,则应该能够单击任何集合以查看源代码(在Eclipse中,使用Command / Ctrl +在对象的构造函数上单击鼠标左键)。

我已上载HashMap source code,供您查看特定的实现(编辑-我需要从PasteBin中添加一些代码,但只能包含约500行。完整的实现可通过链接获得):

关于您有关连续内存空间的问题,该实现声明HashMap类由Bins和TreeBin-TreeNodes数据结构的组合支持。在内存方面,此数据结构最终将是不连续的,每个节点/ bin都包含指向相邻节点/ bin的指针,并允许调整大小以在整个内存中执行各种地址(至少是我的理解)。

    * Copyright (c) 1997, 2017, Oracle and/or its affiliates. All rights reserved.
     * ORACLE PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     */

    package java.util;

    import java.io.IOException;
    import java.io.InvalidObjectException;
    import java.io.Serializable;
    import java.lang.reflect.ParameterizedType;
    import java.lang.reflect.Type;
    import java.util.function.BiConsumer;
    import java.util.function.BiFunction;
    import java.util.function.Consumer;
    import java.util.function.Function;
    import jdk.internal.misc.SharedSecrets;

    /**
     * Hash table based implementation of the {@code Map} interface.  This
     * implementation provides all of the optional map operations, and permits
     * {@code null} values and the {@code null} key.  (The {@code HashMap}
     * class is roughly equivalent to {@code Hashtable}, except that it is
     * unsynchronized and permits nulls.)  This class makes no guarantees as to
     * the order of the map; in particular, it does not guarantee that the order
     * will remain constant over time.
     *
     * <p>This implementation provides constant-time performance for the basic
     * operations ({@code get} and {@code put}), assuming the hash function
     * disperses the elements properly among the buckets.  Iteration over
     * collection views requires time proportional to the "capacity" of the
     * {@code HashMap} instance (the number of buckets) plus its size (the number
     * of key-value mappings).  Thus, it's very important not to set the initial
     * capacity too high (or the load factor too low) if iteration performance is
     * important.
     *
     * <p>An instance of {@code HashMap} has two parameters that affect its
     * performance: <i>initial capacity</i> and <i>load factor</i>.  The
     * <i>capacity</i> is the number of buckets in the hash table, and the initial
     * capacity is simply the capacity at the time the hash table is created.  The
     * <i>load factor</i> is a measure of how full the hash table is allowed to
     * get before its capacity is automatically increased.  When the number of
     * entries in the hash table exceeds the product of the load factor and the
     * current capacity, the hash table is <i>rehashed</i> (that is, internal data
     * structures are rebuilt) so that the hash table has approximately twice the
     * number of buckets.
     *
     * <p>As a general rule, the default load factor (.75) offers a good
     * tradeoff between time and space costs.  Higher values decrease the
     * space overhead but increase the lookup cost (reflected in most of
     * the operations of the {@code HashMap} class, including
     * {@code get} and {@code put}).  The expected number of entries in
     * the map and its load factor should be taken into account when
     * setting its initial capacity, so as to minimize the number of
     * rehash operations.  If the initial capacity is greater than the
     * maximum number of entries divided by the load factor, no rehash
     * operations will ever occur.
     *
     * <p>If many mappings are to be stored in a {@code HashMap}
     * instance, creating it with a sufficiently large capacity will allow
     * the mappings to be stored more efficiently than letting it perform
     * automatic rehashing as needed to grow the table.  Note that using
     * many keys with the same {@code hashCode()} is a sure way to slow
     * down performance of any hash table. To ameliorate impact, when keys
     * are {@link Comparable}, this class may use comparison order among
     * keys to help break ties.
     *
     * <p><strong>Note that this implementation is not synchronized.</strong>
     * If multiple threads access a hash map concurrently, and at least one of
     * the threads modifies the map structurally, it <i>must</i> be
     * synchronized externally.  (A structural modification is any operation
     * that adds or deletes one or more mappings; merely changing the value
     * associated with a key that an instance already contains is not a
     * structural modification.)  This is typically accomplished by
     * synchronizing on some object that naturally encapsulates the map.
     *
     * If no such object exists, the map should be "wrapped" using the
     * {@link Collections#synchronizedMap Collections.synchronizedMap}
     * method.  This is best done at creation time, to prevent accidental
     * unsynchronized access to the map:<pre>
     *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
     *
     * <p>The iterators returned by all of this class's "collection view methods"
     * are <i>fail-fast</i>: if the map is structurally modified at any time after
     * the iterator is created, in any way except through the iterator's own
     * {@code remove} method, the iterator will throw a
     * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
     * modification, the iterator fails quickly and cleanly, rather than risking
     * arbitrary, non-deterministic behavior at an undetermined time in the
     * future.
     *
     * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
     * as it is, generally speaking, impossible to make any hard guarantees in the
     * presence of unsynchronized concurrent modification.  Fail-fast iterators
     * throw {@code ConcurrentModificationException} on a best-effort basis.
     * Therefore, it would be wrong to write a program that depended on this
     * exception for its correctness: <i>the fail-fast behavior of iterators
     * should be used only to detect bugs.</i>
     *
     * <p>This class is a member of the
     * <a href="{@docRoot}/java/util/package-summary.html#CollectionsFramework">
     * Java Collections Framework</a>.
     *
     * @param <K> the type of keys maintained by this map
     * @param <V> the type of mapped values
     *
     * @author  Doug Lea
     * @author  Josh Bloch
     * @author  Arthur van Hoff
     * @author  Neal Gafter
     * @see     Object#hashCode()
     * @see     Collection
     * @see     Map
     * @see     TreeMap
     * @see     Hashtable
     * @since   1.2
     */
    public class HashMap<K,V> extends AbstractMap<K,V>
        implements Map<K,V>, Cloneable, Serializable {

        private static final long serialVersionUID = 362498820763181265L;

        /*
         * Implementation notes.
         *
         * This map usually acts as a binned (bucketed) hash table, but
         * when bins get too large, they are transformed into bins of
         * TreeNodes, each structured similarly to those in
         * java.util.TreeMap. Most methods try to use normal bins, but
         * relay to TreeNode methods when applicable (simply by checking
         * instanceof a node).  Bins of TreeNodes may be traversed and
         * used like any others, but additionally support faster lookup
         * when overpopulated. However, since the vast majority of bins in
         * normal use are not overpopulated, checking for existence of
         * tree bins may be delayed in the course of table methods.
         *
         * Tree bins (i.e., bins whose elements are all TreeNodes) are
         * ordered primarily by hashCode, but in the case of ties, if two
         * elements are of the same "class C implements Comparable<C>",
         * type then their compareTo method is used for ordering. (We
         * conservatively check generic types via reflection to validate
         * this -- see method comparableClassFor).  The added complexity
         * of tree bins is worthwhile in providing worst-case O(log n)
         * operations when keys either have distinct hashes or are
         * orderable, Thus, performance degrades gracefully under
         * accidental or malicious usages in which hashCode() methods
         * return values that are poorly distributed, as well as those in
         * which many keys share a hashCode, so long as they are also
         * Comparable. (If neither of these apply, we may waste about a
         * factor of two in time and space compared to taking no
         * precautions. But the only known cases stem from poor user
         * programming practices that are already so slow that this makes
         * little difference.)
         *
         * Because TreeNodes are about twice the size of regular nodes, we
         * use them only when bins contain enough nodes to warrant use
         * (see TREEIFY_THRESHOLD). And when they become too small (due to
         * removal or resizing) they are converted back to plain bins.  In
         * usages with well-distributed user hashCodes, tree bins are
         * rarely used.  Ideally, under random hashCodes, the frequency of
         * nodes in bins follows a Poisson distribution
         * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
         * parameter of about 0.5 on average for the default resizing
         * threshold of 0.75, although with a large variance because of
         * resizing granularity. Ignoring variance, the expected
         * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
         * factorial(k)). The first values are:
         *
         * 0:    0.60653066
         * 1:    0.30326533
         * 2:    0.07581633
         * 3:    0.01263606
         * 4:    0.00157952
         * 5:    0.00015795
         * 6:    0.00001316
         * 7:    0.00000094
         * 8:    0.00000006
         * more: less than 1 in ten million
         *
         * The root of a tree bin is normally its first node.  However,
         * sometimes (currently only upon Iterator.remove), the root might
         * be elsewhere, but can be recovered following parent links
         * (method TreeNode.root()).
         *
         * All applicable internal methods accept a hash code as an
         * argument (as normally supplied from a public method), allowing
         * them to call each other without recomputing user hashCodes.
         * Most internal methods also accept a "tab" argument, that is
         * normally the current table, but may be a new or old one when
         * resizing or converting.
         *
         * When bin lists are treeified, split, or untreeified, we keep
         * them in the same relative access/traversal order (i.e., field
         * Node.next) to better preserve locality, and to slightly
         * simplify handling of splits and traversals that invoke
         * iterator.remove. When using comparators on insertion, to keep a
         * total ordering (or as close as is required here) across
         * rebalancings, we compare classes and identityHashCodes as
         * tie-breakers.
         *
         * The use and transitions among plain vs tree modes is
         * complicated by the existence of subclass LinkedHashMap. See
         * below for hook methods defined to be invoked upon insertion,
         * removal and access that allow LinkedHashMap internals to
         * otherwise remain independent of these mechanics. (This also
         * requires that a map instance be passed to some utility methods
         * that may create new nodes.)
         *
         * The concurrent-programming-like SSA-based coding style helps
         * avoid aliasing errors amid all of the twisty pointer operations.
         */

        /**
         * The default initial capacity - MUST be a power of two.
         */
        static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

        /**
         * The maximum capacity, used if a higher value is implicitly specified
         * by either of the constructors with arguments.
         * MUST be a power of two <= 1<<30.
         */
        static final int MAXIMUM_CAPACITY = 1 << 30;

        /**
         * The load factor used when none specified in constructor.
         */
        static final float DEFAULT_LOAD_FACTOR = 0.75f;

        /**
         * The bin count threshold for using a tree rather than list for a
         * bin.  Bins are converted to trees when adding an element to a
         * bin with at least this many nodes. The value must be greater
         * than 2 and should be at least 8 to mesh with assumptions in
         * tree removal about conversion back to plain bins upon
         * shrinkage.
         */
        static final int TREEIFY_THRESHOLD = 8;

        /**
         * The bin count threshold for untreeifying a (split) bin during a
         * resize operation. Should be less than TREEIFY_THRESHOLD, and at
         * most 6 to mesh with shrinkage detection under removal.
         */
        static final int UNTREEIFY_THRESHOLD = 6;

        /**
         * The smallest table capacity for which bins may be treeified.
         * (Otherwise the table is resized if too many nodes in a bin.)
         * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
         * between resizing and treeification thresholds.
         */
        static final int MIN_TREEIFY_CAPACITY = 64;

        /**
         * Basic hash bin node, used for most entries.  (See below for
         * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
         */
        static class Node<K,V> implements Map.Entry<K,V> {
            final int hash;
            final K key;
            V value;
            Node<K,V> next;

            Node(int hash, K key, V value, Node<K,V> next) {
                this.hash = hash;
                this.key = key;
                this.value = value;
                this.next = next;
            }

            public final K getKey()        { return key; }
            public final V getValue()      { return value; }
            public final String toString() { return key + "=" + value; }

            public final int hashCode() {
                return Objects.hashCode(key) ^ Objects.hashCode(value);
            }

            public final V setValue(V newValue) {
                V oldValue = value;
                value = newValue;
                return oldValue;
            }

            public final boolean equals(Object o) {
                if (o == this)
                    return true;
                if (o instanceof Map.Entry) {
                    Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                    if (Objects.equals(key, e.getKey()) &&
                        Objects.equals(value, e.getValue()))
                        return true;
                }
                return false;
            }
        }

        /* ---------------- Static utilities -------------- */

        /**
         * Computes key.hashCode() and spreads (XORs) higher bits of hash
         * to lower.  Because the table uses power-of-two masking, sets of
         * hashes that vary only in bits above the current mask will
         * always collide. (Among known examples are sets of Float keys
         * holding consecutive whole numbers in small tables.)  So we
         * apply a transform that spreads the impact of higher bits
         * downward. There is a tradeoff between speed, utility, and
         * quality of bit-spreading. Because many common sets of hashes
         * are already reasonably distributed (so don't benefit from
         * spreading), and because we use trees to handle large sets of
         * collisions in bins, we just XOR some shifted bits in the
         * cheapest possible way to reduce systematic lossage, as well as
         * to incorporate impact of the highest bits that would otherwise
         * never be used in index calculations because of table bounds.
         */
        static final int hash(Object key) {
            int h;
            return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
        }

        /**
         * Returns x's Class if it is of the form "class C implements
         * Comparable<C>", else null.
         */
        static Class<?> comparableClassFor(Object x) {
            if (x instanceof Comparable) {
                Class<?> c; Type[] ts, as; ParameterizedType p;
                if ((c = x.getClass()) == String.class) // bypass checks
                    return c;
                if ((ts = c.getGenericInterfaces()) != null) {
                    for (Type t : ts) {
                        if ((t instanceof ParameterizedType) &&
                            ((p = (ParameterizedType) t).getRawType() ==
                             Comparable.class) &&
                            (as = p.getActualTypeArguments()) != null &&
                            as.length == 1 && as[0] == c) // type arg is c
                            return c;
                    }
                }
            }
            return null;
        }

        /**
         * Returns k.compareTo(x) if x matches kc (k's screened comparable
         * class), else 0.
         */
        @SuppressWarnings({"rawtypes","unchecked"}) // for cast to Comparable
        static int compareComparables(Class<?> kc, Object k, Object x) {
            return (x == null || x.getClass() != kc ? 0 :
                    ((Comparable)k).compareTo(x));
        }

        /**
         * Returns a power of two size for the given target capacity.
         */
        static final int tableSizeFor(int cap) {
            int n = cap - 1;
            n |= n >>> 1;
            n |= n >>> 2;
            n |= n >>> 4;
            n |= n >>> 8;
            n |= n >>> 16;
            return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
        }

        /* ---------------- Fields -------------- */

        /**
         * The table, initialized on first use, and resized as
         * necessary. When allocated, length is always a power of two.
         * (We also tolerate length zero in some operations to allow
         * bootstrapping mechanics that are currently not needed.)
         */
        transient Node<K,V>[] table;

        /**
         * Holds cached entrySet(). Note that AbstractMap fields are used
         * for keySet() and values().
         */
        transient Set<Map.Entry<K,V>> entrySet;

        /**
         * The number of key-value mappings contained in this map.
         */
        transient int size;

        /**
         * The number of times this HashMap has been structurally modified
         * Structural modifications are those that change the number of mappings in
         * the HashMap or otherwise modify its internal structure (e.g.,
         * rehash).  This field is used to make iterators on Collection-views of
         * the HashMap fail-fast.  (See ConcurrentModificationException).
         */
        transient int modCount;

        /**
         * The next size value at which to resize (capacity * load factor).
         *
         * @serial
         */
        // (The javadoc description is true upon serialization.
        // Additionally, if the table array has not been allocated, this
        // field holds the initial array capacity, or zero signifying
        // DEFAULT_INITIAL_CAPACITY.)
        int threshold;

        /**
         * The load factor for the hash table.
         *
         * @serial
         */
        final float loadFactor;

        /* ---------------- Public operations -------------- */

        /**
         * Constructs an empty {@code HashMap} with the specified initial
         * capacity and load factor.
         *
         * @param  initialCapacity the initial capacity
         * @param  loadFactor      the load factor
         * @throws IllegalArgumentException if the initial capacity is negative
         *         or the load factor is nonpositive
         */
        public HashMap(int initialCapacity, float loadFactor) {
            if (initialCapacity < 0)
                throw new IllegalArgumentException("Illegal initial capacity: " +
                                                   initialCapacity);
            if (initialCapacity > MAXIMUM_CAPACITY)
                initialCapacity = MAXIMUM_CAPACITY;
            if (loadFactor <= 0 || Float.isNaN(loadFactor))
                throw new IllegalArgumentException("Illegal load factor: " +
                                                   loadFactor);
            this.loadFactor = loadFactor;
            this.threshold = tableSizeFor(initialCapacity);
        }

        /**
         * Constructs an empty {@code HashMap} with the specified initial
         * capacity and the default load factor (0.75).
         *
         * @param  initialCapacity the initial capacity.
         * @throws IllegalArgumentException if the initial capacity is negative.
         */
        public HashMap(int initialCapacity) {
            this(initialCapacity, DEFAULT_LOAD_FACTOR);
        }

        /**
         * Constructs an empty {@code HashMap} with the default initial capacity
         * (16) and the default load factor (0.75).
         */
        public HashMap() {
            this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
        }

        /**
         * Constructs a new {@code HashMap} with the same mappings as the
         * specified {@code Map}.  The {@code HashMap} is created with
         * default load factor (0.75) and an initial capacity sufficient to
         * hold the mappings in the specified {@code Map}.
         *
         * @param   m the map whose mappings are to be placed in this map
         * @throws  NullPointerException if the specified map is null
         */
        public HashMap(Map<? extends K, ? extends V> m) {
            this.loadFactor = DEFAULT_LOAD_FACTOR;
            putMapEntries(m, false);
        }