How to implement a key-value pair with variability in the key

时间:2019-01-07 12:55:47

标签: java treemap comparable

I'm writing some code to de-duplicate data based on 2 fields:

  1. A string of characters, we'll call this the UMI
  2. An array of integers

I've created a POJO to hold this data and work as key for a TreeMap. The full set of data is held in the value - this way I only keep relevant data in memory.

However, the next requirement is to have variability in the UMI AND the integers. For example, the following two pieces of data would be considered duplicates based on the UMI having a variability(mismatch) of 1.

a. "AAA", [200,300]

b. "ABA", [200,300]

Similarly, the following would be considered duplicates based on the integer array, given a mismatch allowance of 2.

a. "AAA", [201,300]

b. "AAA", [203,300]

My current attempt has been to make this POJO implement the Comparable interface, and attempt to work the compareTo method to take into account the variability:

public class UMIPrimoKey implements Comparable<UMIPrimoKey> {

    private final String UMI;
    private final int[] ints;
    private final int umiMisMatch;
    private final int posMisMatch;

    public UMIPrimoKey(String UMI, int[] ints, int umiMisMatch, int posMisMatch) {
        this.UMI = UMI;
        this.ints = ints;
        this.umiMisMatch = umiMisMatch;
        this.posMisMatch = posMisMatch;
    }

    @Override
    public int compareTo(UMIPrimoKey o) {
        if (!Arrays.equals(ints, o.ints)) {
            if (ints.length == o.ints.length) {
                for (int i = 0; i < ints.length; i++) {
                    if (Math.abs(ints[i] - o.ints[i]) > posMisMatch) {
                        return -1;
                    }
                }
            } else {
                return -1;
            }
        }

        if (XsamStringUtils.numberOfDifferences(UMI, o.UMI) <= umiMisMatch) {
            return 0;
        }

        return 1;
    }
}

XsamStringUtils.numberOfDifferences is just a simple static method to count the number of differences between the two UMIs.

I return -1 if any two integers from the array have a difference greater than the allowed mismatches (posMisMatch). 0 is returned if the integers are allowed, and the number of mismatches in the UMI is less than the allowed amount, specified by umiMisMatch.

Otherwise, 1 is returned as the UMIs don't match.

I've then used this in a TreeMap which takes into account the compareTo method.

This works in my unit tests, with small numbers of UMIPrimoKeys added to it, but I'm getting some strange results when running the completed program. It's probably due to the rules for the method outlined here: https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html but i'm finding it hard to adapt the code to take the rules into account.

Any direction is appreciated, thanks for reading!

1 个答案:

答案 0 :(得分:2)

根据compareTo的docs

  

实现者必须确保所有x和y的sgn(x.compareTo(y))== -sgn(y.compareTo(x))。 (这意味着x.compareTo(y)必须抛出异常,而y.compareTo(x)则抛出异常。)

     

实现者还必须确保该关系是可传递的:(x.compareTo(y)> 0 && y.compareTo(z)> 0)意味着x.compareTo(z)> 0。

     

最后,实现者必须确保对于所有z,x.compareTo(y)== 0意味着sgn(x.compareTo(z))== sgn(y.compareTo(z))。

我认为您的代码并不正确,这可能会导致get函数找不到您的条目