hashcode是否是Strings独有的?

时间:2014-06-01 08:02:15

标签: java hashcode

最近,我遇到了一段代码,其中使用了Map<Integer, String>,其中Integer(键)是hashCode的某个字符串和String值对应的那个

这是正确的事吗?现在,在调用equals时,String不会调用get。 (get也是在String对象上使用hashCode()方法完成的。

或者,hashCode(s)对于唯一的字符串是唯一的吗?

我查看了equals od String课程。有为此写的逻辑。我很困惑。

5 个答案:

答案 0 :(得分:9)

HashMap 使用equals()来比较密钥。它仅使用hashCode()来查找密钥所在的存储区,从而大幅减少与equals()进行比较的密钥数。

显然,hashCode()不能生成唯一值,因为int限制为2 ^ 32个不同的值,并且存在无限可能的String值。

总之,hashCode()的结果不适合Map的密钥。

答案 1 :(得分:4)

不,它们并不是唯一的,例如&#34; FB&#34;和&#34; Ea&#34;两者都有hashCode = 2236

答案 2 :(得分:1)

这是不正确的,因为hashCode()会发生碰撞。由于hashCode()定义不明确,因此任何字符串对都可能发生冲突。所以你不能使用它。如果需要指向字符串的唯一指针,则可以使用加密哈希作为键:

以下代码显示了如何执行此操作:

/**
 * Immutable class that represents a unique key for a string. This unique key
 * can be used as a key in a hash map, without the likelihood of a collision:
 * generating the same key for a different String. Note that you should first
 * check if keeping a key to a reference of a string is feasible, in that case a
 * {@link Set} may suffice.
 * <P>
 * This class utilizes SHA-512 to generate the keys and uses
 * {@link StandardCharsets#UTF_8} for the encoding of the strings. If a smaller
 * output size than 512 bits (64 bytes) is required then the leftmost bytes of
 * the SHA-512 hash are used. Smaller keys are therefore contained in with
 * larger keys over the same String value.
 * <P>
 * Note that it is not impossible to create collisions for key sizes up to 8-20
 * bytes.
 * 
 * @author owlstead
 */
public final class UniqueKey implements Serializable {

    public static final int MIN_DIGEST_SIZE_BYTES = 8;
    public static final int MAX_DIGEST_SIZE_BYTES = 64;

    /**
     * Creates a unique key for a string with the maximum size of 64 bytes.
     * 
     * @param input
     *            the input, not null
     * @return the generated instance
     */
    public static UniqueKey createUniqueKey(final CharSequence input) {
        return doCreateUniqueKey(input, MAX_DIGEST_SIZE_BYTES);
    }

    /**
     * Creates a unique key for a string with a size of 8 to 64 bytes.
     * 
     * @param input
     *            the input, not null
     * @param outputSizeBytes
     *            the output size
     * @return the generated instance
     */
    public static UniqueKey createUniqueKey(final CharSequence input,
            final int outputSizeBytes) {
        return doCreateUniqueKey(input, outputSizeBytes);
    }

    @Override
    public boolean equals(final Object obj) {
        if (!(obj instanceof UniqueKey)) {
            return false;
        }
        final UniqueKey that = (UniqueKey) obj;
        return ByteBuffer.wrap(this.key).equals(ByteBuffer.wrap(that.key));
    }

    @Override
    public int hashCode() {
        return ByteBuffer.wrap(this.key).hashCode();
    }

    /**
     * Outputs an - in itself - unique String representation of this key.
     * 
     * @return the string <CODE>"{key: [HEX ENCODED KEY]}"</CODE>
     */
    @Override
    public String toString() {
        // non-optimal but readable conversion to hexadecimal
        final StringBuilder sb = new StringBuilder(this.key.length * 2);
        sb.append("{Key: ");
        for (int i = 0; i < this.key.length; i++) {
            sb.append(String.format("%02X", this.key[i]));
        }
        sb.append("}");
        return sb.toString();
    }

    /**
     * Makes it possible to retrieve the underlying key data (e.g. to use a
     * different encoding).
     * 
     * @return the data in a read only ByteBuffer
     */
    public ByteBuffer asReadOnlyByteBuffer() {
        return ByteBuffer.wrap(this.key).asReadOnlyBuffer();
    }

    private static final long serialVersionUID = 1L;

    private static final int BUFFER_SIZE = 512;

    // byte array instead of ByteBuffer to support serialization
    private final byte[] key;

    private static UniqueKey doCreateUniqueKey(final CharSequence input,
            final int outputSizeBytes) {

        // --- setup digest

        final MessageDigest digestAlgorithm;
        try {
            // note: relatively fast on 64 bit systems (faster than SHA-256!)
            digestAlgorithm = MessageDigest.getInstance("SHA-512");
        } catch (final NoSuchAlgorithmException e) {
            throw new IllegalStateException(
                    "SHA-256 should always be avialable in a Java RE");
        }

        // --- validate input parameters

        if (outputSizeBytes < MIN_DIGEST_SIZE_BYTES
                || outputSizeBytes > MAX_DIGEST_SIZE_BYTES) {
            throw new IllegalArgumentException(
                    "Unique key size either too small or too big");
        }

        // --- setup loop

        final CharsetEncoder encoder = StandardCharsets.UTF_8.newEncoder();
        final CharBuffer buffer = CharBuffer.wrap(input);
        final ByteBuffer encodedBuffer = ByteBuffer.allocate(BUFFER_SIZE);
        CoderResult coderResult;

        // --- loop over all characters
        // (instead of encoding everything to byte[] at once - peak memory!)

        while (buffer.hasRemaining()) {
            coderResult = encoder.encode(buffer, encodedBuffer, false);
            if (coderResult.isError()) {
                throw new IllegalArgumentException(
                        "Invalid code point in input string");
            }
            encodedBuffer.flip();
            digestAlgorithm.update(encodedBuffer);
            encodedBuffer.clear();
        }

        coderResult = encoder.encode(buffer, encodedBuffer, true);
        if (coderResult.isError()) {
            throw new IllegalArgumentException(
                    "Invalid code point in input string");
        }
        encodedBuffer.flip();
        digestAlgorithm.update(encodedBuffer);
        // no need to clear encodedBuffer if generated locally

        // --- resize result if required

        final byte[] digest = digestAlgorithm.digest();
        final byte[] result;
        if (outputSizeBytes == digest.length) {
            result = digest;
        } else {
            result = Arrays.copyOf(digest, outputSizeBytes);
        }

        // --- and return the final, possibly resized, result

        return new UniqueKey(result);
    }

    private UniqueKey(final byte[] key) {
        this.key = key;
    }
}

答案 3 :(得分:0)

通过对键String进行自己的散列,该代码冒着两个不同键字符串生成相同整数映射键的可能性,并且代码在某些情况下会失败。

通常,代码应该使用Map<String,String>

然而,作者可能是出于好(即故意)理由这样做而且不是错误。没有看到代码,我们无法分辨。

答案 4 :(得分:0)

hashCode用于提高集合的性能。例如,使用包含唯一项的Set。添加新项目时,hashCode将用于检测项目中是否已存在该项目。

如果发生碰撞,我们将陷入(通常)较慢的等于比较。

因此,对于hashCode应返回不同值的性能非常重要。更重要的是,在给定相同的内部状态的情况下,对象的hashCode在调用之间是一致的。

因此,使用hashCode作为地图中的键是不明智的!它们并不是唯一的。