Apache Commons Lang HashCodeBuilder碰撞

时间:2015-05-07 18:42:42

标签: java hash apache-commons hash-collision apache-commons-lang

我使用版本 3.4 使用Apache Commons Lang HashCodeBuilder时遇到了冲突。我正在哈希一个Route对象,它包含两个Cell对象,start和end。最后,我提供了一个碰撞发生的例子。这两个类都覆盖 hashCode 等于方法。首先是Cell类:

import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;

public class Cell {
    private int east;
    private int south;

    public Cell(int east, int south) {
        this.east = east;
        this.south = south;
    }

    public int getEast() {
        return east;
    }

    public void setEast(int east) {
        this.east = east;
    }

    public int getSouth() {
        return south;
    }

    public void setSouth(int south) {
        this.south = south;
    }

    @Override
    /**
     * Compute hash code by using Apache Commons Lang HashCodeBuilder.
     */
    public int hashCode() {
        return new HashCodeBuilder(17, 31)
                .append(this.south)
                .append(this.east)
                .toHashCode();
    }

    @Override
    /**
     * Compute equals by using Apache Commons Lang EqualsBuilder.
     */
    public boolean equals(Object obj) {
        if (!(obj instanceof Cell))
            return false;
        if (obj == this)
            return true;

        Cell cell = (Cell) obj;
        return new EqualsBuilder()
                .append(this.south, cell.south)
                .append(this.east, cell.east)
                .isEquals();
    }
}

和Route类:

import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;

import java.util.*;

public class Route {
    private Cell startCell;
    private Cell endCell;

    public Route(Cell startCell, Cell endCell) {
        this.startCell = startCell;
        this.endCell = endCell;
    }

    public Cell getStartCell() {
        return startCell;
    }

    public void setStartCell(Cell startCell) {
        this.startCell = startCell;
    }

    public Cell getEndCell() {
        return endCell;
    }

    public void setEndCell(Cell endCell) {
        this.endCell = endCell;
    }


    @Override
    public int hashCode() {
        return new HashCodeBuilder(43, 59)
                .append(this.startCell)
                .append(this.endCell)
                .toHashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof Route))
            return false;
        if (obj == this)
            return true;

        Route route = (Route) obj;
        return new EqualsBuilder()
                .append(this.startCell, route.startCell)
                .append(this.endCell, route.endCell)
                .isEquals();
    }
}

碰撞示例:

public class Collision {
    public static void main(String[] args) {
        Route route1 = new Route(new Cell(154, 156), new Cell(154, 156));
        Route route2 = new Route(new Cell(153, 156), new Cell(151, 158));

        System.out.println(route1.hashCode() + " " + route2.hashCode());
    }
}

输出 1429303 1429303 。现在,如果我将两个类的初始奇数和乘数奇数更改为相同,则此示例不会发生冲突。但在HashCodeBuilder的文档中,它明确指出:

  

必须传入两个随机选择的奇数。理想情况下这些   每个班级应该不同,但这并不重要。

理想情况下,如果可能的话,我想为我的例子提供完美哈希函数(内射函数)。

2 个答案:

答案 0 :(得分:0)

在java中,哈希码被绑定到整数范围(32位),这意味着如果您有超过2 ^ 62个对象,则会发生冲突(如果您有理想的分布,则会发生冲突)。但实际上碰撞更常发生,因为哈希码提供的不完美。

答案 1 :(得分:0)

您可以通过在生成哈希代码时添加更多参数来更优化地分发生成的哈希代码(这与Apache公共库无关)。通过此示例,您可以预先计算Route类的一个或多个属性,并在生成哈希代码时使用此属性。例如,计算两个Cell对象之间的直线斜率:

double slope = (startCell.getEast() - endCell.getEast());
if ( slope == 0 ){//prevent division by 0
    slope = startCell.getSouth() - endCell.getSouth();
}else{
    slope = (startCell.getSouth() - endCell.getSouth()) / slope;
}

return new HashCodeBuilder(43, 59)
   .append(this.startCell)
   .append(this.endCell)
   .append(slope)
   .toHashCode();

使用您的示例生成 83091911 83088489 。或者(或与之一起)使用两个Cell对象之间的距离:

double length = Math.sqrt(Math.pow(startCell.getSouth() - endCell.getSouth(), 2) + Math.pow(startCell.getEast() - endCell.getEast(), 2));
return new HashCodeBuilder(43, 59)
   .append(this.startCell)
   .append(this.endCell)
   .append(length)
   .toHashCode();

与您的示例单独使用会导致 83091911 -486891382

并测试这是否可以防止碰撞:

List<Cell> cells = new ArrayList<Cell>();
for ( int i = 0; i < 50; i++ ){
    for ( int j = 0; j < 50; j++ ){
        Cell c = new Cell(i,j);
        cells.add(c);

    }
}
System.out.println(cells.size() + " cells generated");
System.out.println("Testing " + (cells.size()*cells.size()) + " number of Routes");
Set<Integer> set = new HashSet<Integer>();
int collisions = 0;
for ( int i = 0; i < cells.size(); i++ ){
    for ( int j = 0; j < cells.size(); j++ ){
        Route r = new Route(cells.get(i), cells.get(j));
        if ( set.contains(r.hashCode() ) ){
            collisions++;
        }
        set.add(r.hashCode());
    }
}
System.out.println(collisions);

在生成的 6,250,000 路由中:

  1. 没有长度斜率 6,155,919 碰撞
  2. 长度斜率 873,047 碰撞