将两个LinkedHashMaps与值作为列表

时间:2017-02-13 10:55:11

标签: java nlp linkedhashmap

我已经多次以不同的方式问过这个问题了。每当我获得突破,我都会遇到另一个问题。这也是因为我还不熟悉Java,并且对像Maps这样的集合有困难。所以请耐心等待。

我有两张这样的地图:

Map1 -{ORGANIZATION=[Fulton Tax Commissioner 's Office, Grady Hospital, Fulton Health Department], LOCATION=[Bellwood, Alpharetta]}

Map2 - {ORGANIZATION=[Atlanta Police Department, Fulton Tax Commissioner, Fulton Health Department], LOCATION=[Alpharetta], PERSON=[Bellwood, Grady Hospital]}

地图定义为:LinkedHashMap<String, List<String>> sampleMap = new LinkedHashMap<String, List<String>>();

我根据这些值比较这两张地图,只有3个键,即ORGANIZATION,PERSON和LOCATION。 Map1是我的比赛,我正在比较Map2。现在我面临的问题是当我迭代Map1中的ORGANIZATION键的值并检查Map2中的匹配时,即使我的第一个条目在Map2(富尔顿税务专员)中有部分匹配但是因为Map2的第一个条目(亚特兰大警察局)不匹配我得到了错误的结果(我正在寻找精确和部分匹配)。这里的结果是递增真阳性,假阳性和假阴性计数器,这使我能够最终计算精确度和召回率,即命名实体识别。

修改

我期待的结果是

Organization: 
True Positive Count = 2
False Negative Count = 1
False Positive Count = 1

Person:
False Positive Count = 2

Location:
True Positive Count = 1
False Negative Count = 1

我目前得到的输出是:

Organization: 
    True Positive Count = 1
    False Negative Count = 2
    False Positive Count = 0

    Person:
    True Positive Count = 0
    False Negative Count = 0
    False Positive Count = 2

    Location:
    True Positive Count = 0
    False Negative Count = 1
    False Positive Count = 0

CODE

private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) 
    {
        List<Integer> compareResults = new ArrayList<Integer>();  

         if (!annotationMap.entrySet().containsAll(rageMap.entrySet())){
               for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){
                   if (rageEntry.getKey().equals("ORGANIZATION") && !(annotationMap.containsKey(rageEntry.getKey()))){
                       for (int j = 0; j< rageEntry.getValue().size(); j++) {
                           orgFalsePositiveCount++;
                       }
               }
                   if (rageEntry.getKey().equals("PERSON") && !(annotationMap.containsKey(rageEntry.getKey()))){
                      // System.out.println(rageEntry.getKey());
                      // System.out.println(annotationMap.entrySet());
                       for (int j = 0; j< rageEntry.getValue().size(); j++) {
                           perFalsePositiveCount++;
                       }
               }
                   if (rageEntry.getKey().equals("LOCATION") && !(annotationMap.containsKey(rageEntry.getKey()))){
                       for (int j = 0; j< rageEntry.getValue().size(); j++) {
                           locFalsePositiveCount++;
                     }
                 }
              }
           }



               for (Entry<String, List<String>> entry : annotationMap.entrySet()){

                   int i_index = 0;
                   if (rageMap.entrySet().isEmpty()){
                       orgFalseNegativeCount++;
                       continue;
                   }

                  // for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){

                   if (entry.getKey().equals("ORGANIZATION")){
                       for(String val : entry.getValue()) {
                           if (rageMap.get(entry.getKey()) == null){
                               orgFalseNegativeCount++;
                               continue;
                       }
            recusion:      for (int i = i_index; i< rageMap.get(entry.getKey()).size();){
                                String rageVal = rageMap.get(entry.getKey()).get(i);
                               if(val.equals(rageVal)){
                                   orgTruePositiveCount++;
                                   i_index++;
                                   break recusion;
                       }

                           else if((val.length() > rageVal.length()) && val.contains(rageVal)){  //|| dataB.get(entryA.getKey()).contains(entryA.getValue())){
                               orgTruePositiveCount++;
                               i_index++;
                               break recusion;
                       }
                           else if((val.length() < rageVal.length()) && rageVal.contains(val)){
                               orgTruePositiveCount++;
                                i_index++;
                                break recusion;
                           }

                           else if(!val.contains(rageVal)){
                               orgFalseNegativeCount++;
                               i_index++;
                               break recusion;
                           }
                           else if(!rageVal.contains(val)){
                                 orgFalsePositiveCount++;
                                 i_index++;
                                 break recusion;
                             }


                      }
                    }
                   }

                  ......................... //(Same for person and location)


                    compareResults.add(orgTruePositiveCount); 
                    compareResults.add(orgFalseNegativeCount); 
                    compareResults.add(orgFalsePositiveCount);  
                    compareResults.add(perTruePositiveCount); 
                    compareResults.add(perFalseNegativeCount);  
                    compareResults.add(perFalsePositiveCount); 
                    compareResults.add(locTruePositiveCount); 
                    compareResults.add(locFalseNegativeCount);  
                    compareResults.add(locFalsePositiveCount); 

                    System.out.println(compareResults);
                    return compareResults;

            }  

3 个答案:

答案 0 :(得分:1)

如果我做对了这可能有帮助。

我创建了一个自定义字符串来覆盖部分匹配的等号

public class MyCustomString {

    private String myString;

    public MyCustomString(String myString) {
        this.myString = myString;
    }

    public String getMyString() {
        return myString;
    }

    public void setMyString(String myString) {
        this.myString = myString;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final MyCustomString other = (MyCustomString) obj;
        if (!Objects.equals(this.myString, other.myString) && !other.myString.contains(this.myString)) {
            return false;
        }
        return true;
    }

    // add getter and setter for myString 
    // or delegate needed methods to myString object.
    @Override
    public int hashCode() {
        int hash = 3;
        hash = 47 * hash + Objects.hashCode(this.myString);
        return hash;
    }
}

这里是我尝试使用地图第一部分的代码

LinkedHashMap<String, List<MyCustomString>> sampleMap1 = new LinkedHashMap<String, List<MyCustomString>>();
        sampleMap1.put("a", new ArrayList<>());
        sampleMap1.get("a").add(new MyCustomString("Fulton Tax Commissioner 's Office"));
        sampleMap1.get("a").add(new MyCustomString("Grady Hospital"));
        sampleMap1.get("a").add(new MyCustomString("Fulton Health Department"));

        LinkedHashMap<String, List<MyCustomString>> sampleMap2 = new LinkedHashMap<String, List<MyCustomString>>();
        sampleMap2.put("a", new ArrayList<>());
        sampleMap2.get("a").add(new MyCustomString("Atlanta Police Department"));
        sampleMap2.get("a").add(new MyCustomString("Fulton Tax Commissioner"));
        sampleMap2.get("a").add(new MyCustomString("Fulton Health Department"));

        HashMap<String, Integer> resultMap = new HashMap<String, Integer>();

        for (Map.Entry<String, List<MyCustomString>> entry : sampleMap1.entrySet()) {
            String key1 = entry.getKey();
            List<MyCustomString> value1 = entry.getValue();
            List<MyCustomString> singleListOfMap2 = sampleMap2.get(key1);
            if (singleListOfMap2 == null) {
                // all entry are false negative
                System.out.println("Number of false N" + value1.size());
            }
            for (MyCustomString singleStringOfMap2 : singleListOfMap2) {
                if (value1.contains(singleStringOfMap2)) {
                    //True positive
                    System.out.println("true");
                } else {
                    //false negative
                    System.out.println("false N");
                }
            }
            int size = singleListOfMap2.size();
            System.out.println(size + " - numero di true");
            //false positive = size - true
        }
        for (String string : sampleMap2.keySet()) {
            if (sampleMap1.get(string) == null) {
                //all these are false positive
                System.out.println("numero di false P: " + sampleMap2.get(string).size());
            }
        }

答案 1 :(得分:1)

我写了这个类来比较地图:

public class MapComparison<K, V> {
    private final Map<K, Collection<ValueCounter>> temp;
    private final Map<K, Collection<V>> goldMap;
    private final Map<K, Collection<V>> comparedMap;
    private final BiPredicate<V, V> valueMatcher;

    public MapComparison(Map<K, Collection<V>> mapA, Map<K, Collection<V>> mapB, BiPredicate<V, V> valueMatcher) {
        this.goldMap = mapA;
        this.comparedMap = mapB;
        this.valueMatcher = valueMatcher;

        this.temp = new HashMap<>();

        goldMap.forEach((key, valueList) -> {
            temp.put(key, valueList.stream().map(value -> new ValueCounter(value, true)).collect(Collectors.toList()));
        });

        comparedMap.entrySet().stream().forEach(entry -> {

            K key = entry.getKey();
            Collection<V> valueList = entry.getValue();

            if(temp.containsKey(key)) {
                Collection<ValueCounter> existingMatches = temp.get(key);

                Stream<V> falsePositives = valueList.stream().filter(v -> existingMatches.stream().noneMatch(mv -> mv.match(v)));

                falsePositives.forEach(fp -> existingMatches.add(new ValueCounter(fp, false)));
            } else {
                temp.putIfAbsent(key, valueList.stream().map(value -> new ValueCounter(value, false)).collect(Collectors.toList()));
            }
        });
    }

    public String formatMatchedCounters() {
        StringBuilder sb = new StringBuilder();

        for(Entry<K, Collection<ValueCounter>> e : temp.entrySet()) {
            sb.append(e.getKey()).append(":");

            int[] counters = e.getValue().stream().collect(() -> new int[3], (a, b) -> {
                a[0] += b.truePositiveCount;
                a[1] += b.falsePositiveCount;
                a[2] += b.falseNegativeCount;
            }, (c, d) -> {
                c[0] += d[0];
                c[1] += d[1];
                c[2] += d[2];
            });
            sb.append(String.format("\ntruePositiveCount=%s\nfalsePositiveCount=%s\nfalseNegativeCount=%s\n\n", counters[0], counters[1], counters[2]));
        }
        return sb.toString();
    }


    private class ValueCounter {
        private final V goldValue;

        private int truePositiveCount = 0;
        private int falsePositiveCount = 0;
        private int falseNegativeCount = 0;

        ValueCounter(V value, boolean isInGoldMap) {
            this.goldValue = value;

            if(isInGoldMap) {
                falseNegativeCount = 1;
            } else {
                falsePositiveCount = 1;
            }
        }

        boolean match(V otherValue) {
            boolean result = valueMatcher.test(goldValue, otherValue);

            if(result) {
                truePositiveCount++;

                falseNegativeCount = 0;
            }
            return result;
        }
    }
}

基本上是创建一个地图项的联合,每个项目都有自己的可变计数器来计算匹配值。方法formatMatchedCounters()只是为每个键迭代并求和这些计数器。

以下测试:

public class MapComparisonTest {

    private Map<String, Collection<String>> goldMap;
    private Map<String, Collection<String>> comparedMap;
    private BiPredicate<String, String> valueMatcher;

    @Before
    public void initMaps() {
        goldMap = new HashMap<>();
        goldMap.put("ORGANIZATION", Arrays.asList("Fulton Tax Commissioner", "Grady Hospital", "Fulton Health Department"));
        goldMap.put("LOCATION", Arrays.asList("Bellwood", "Alpharetta"));

        comparedMap = new HashMap<>();
        comparedMap.put("ORGANIZATION", Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department"));
        comparedMap.put("LOCATION", Arrays.asList("Alpharetta"));
        comparedMap.put("PERSON", Arrays.asList("Bellwood", "Grady Hospital"));

        valueMatcher = String::equalsIgnoreCase;
    }

    @Test
    public void test() {
        MapComparison<String, String> comparison = new MapComparison<>(goldMap, comparedMap, valueMatcher);

        System.out.println(comparison.formatMatchedCounters());
    }
}

的结果是:

ORGANIZATION:
truePositiveCount=2
falsePositiveCount=1
falseNegativeCount=1

LOCATION:
truePositiveCount=1
falsePositiveCount=0
falseNegativeCount=1

PERSON:
truePositiveCount=0
falsePositiveCount=2
falseNegativeCount=0

请注意,我不知道您希望如何比较类似的价值观(例如&#34; Fulton Tax Commissioner&#34; vs&#34; Fulton Tax Commissioner s&#34;),所以我决定将该决定放在签名中(在本例中为BiPredicate作为参数)。

例如,可以使用Levenshtein distance

实现字符串比较
valueMatcher = (s1, s2) -> StringUtils.getLevenshteinDistance(s1, s2) < 5;

答案 2 :(得分:1)

我想出了一个简化版本。这是我得到的输出:

Organization:
    False Positive: Atlanta Police Department
    True Positive: Fulton Tax Commissioner
    True Positive: Fulton Health Department
    False Negative: Grady Hospital

Person:
    False Positive: Bellwood
    False Positive: Grady Hospital

Location:
    True Positive: Alpharetta
    False Negative: Bellwood

[2, 1, 1, 0, 0, 2, 1, 1, 0]

这是我创建的代码:

public class MapCompare {

    public static boolean listContains(List<String> annotationList, String value) {
        if(annotationList.contains(value)) {
            // 100% Match
            return true;
        }
        for(String s: annotationList) {
            if (s.contains(value) || value.contains(s)) {
                // Partial Match
                return true;
            }
        }
        return false;
    }

    public static List<Integer> compareLists(List<String> annotationList, List<String> rageList){
        List<Integer> compareResults = new ArrayList<Integer>();
        if(annotationList == null || rageList == null) return Arrays.asList(0, 0, 0);
        Integer truePositiveCount = 0;
        Integer falseNegativeCount = 0;
        Integer falsePositiveCount = 0;

        for(String r: rageList) {
            if(listContains(annotationList, r)) {
                System.out.println("\tTrue Positive: " + r);
                truePositiveCount ++;
            } else {
                System.out.println("\tFalse Positive: " + r);
                falsePositiveCount ++;
            }
        }

        for(String s: annotationList) {
            if(listContains(rageList, s) == false){
                System.out.println("\tFalse Negative: " + s);
                falseNegativeCount ++;
            }
        }

        compareResults.add(truePositiveCount);
        compareResults.add(falseNegativeCount);
        compareResults.add(falsePositiveCount);

        System.out.println();

        return compareResults;
    }

    private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) {
        List<Integer> compareResults = new ArrayList<Integer>();
        System.out.println("Organization:");
        compareResults.addAll(compareLists(annotationMap.get("ORGANIZATION"), rageMap.get("ORGANIZATION")));
        System.out.println("Person:");
        compareResults.addAll(compareLists(annotationMap.get("PERSON"), rageMap.get("PERSON")));
        System.out.println("Location:");
        compareResults.addAll(compareLists(annotationMap.get("LOCATION"), rageMap.get("LOCATION")));
        System.out.println(compareResults);
        return compareResults;
    }

    public static void main(String[] args) {
        LinkedHashMap<String, List<String>> Map1 = new LinkedHashMap<>();
        List<String> m1l1 = Arrays.asList("Fulton Tax Commissioner's Office", "Grady Hospital", "Fulton Health Department");
        List<String> m1l2 = Arrays.asList("Bellwood", "Alpharetta");
        List<String> m1l3 = Arrays.asList();
        Map1.put("ORGANIZATION", m1l1);
        Map1.put("LOCATION", m1l2);
        Map1.put("PERSON", m1l3);

        LinkedHashMap<String, List<String>> Map2 = new LinkedHashMap<>();
        List<String> m2l1 = Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department");
        List<String> m2l2 = Arrays.asList("Alpharetta");
        List<String> m2l3 = Arrays.asList("Bellwood", "Grady Hospital");

        Map2.put("ORGANIZATION", m2l1);
        Map2.put("LOCATION", m2l2);
        Map2.put("PERSON", m2l3);

        compareMaps(Map1, Map2);

    }

}

希望这有帮助!