我已经多次以不同的方式问过这个问题了。每当我获得突破,我都会遇到另一个问题。这也是因为我还不熟悉Java,并且对像Maps这样的集合有困难。所以请耐心等待。
我有两张这样的地图:
Map1 -{ORGANIZATION=[Fulton Tax Commissioner 's Office, Grady Hospital, Fulton Health Department], LOCATION=[Bellwood, Alpharetta]}
Map2 - {ORGANIZATION=[Atlanta Police Department, Fulton Tax Commissioner, Fulton Health Department], LOCATION=[Alpharetta], PERSON=[Bellwood, Grady Hospital]}
地图定义为:LinkedHashMap<String, List<String>> sampleMap = new LinkedHashMap<String, List<String>>();
我根据这些值比较这两张地图,只有3个键,即ORGANIZATION,PERSON和LOCATION。 Map1是我的比赛,我正在比较Map2。现在我面临的问题是当我迭代Map1中的ORGANIZATION键的值并检查Map2中的匹配时,即使我的第一个条目在Map2(富尔顿税务专员)中有部分匹配但是因为Map2的第一个条目(亚特兰大警察局)不匹配我得到了错误的结果(我正在寻找精确和部分匹配)。这里的结果是递增真阳性,假阳性和假阴性计数器,这使我能够最终计算精确度和召回率,即命名实体识别。
修改
我期待的结果是
Organization:
True Positive Count = 2
False Negative Count = 1
False Positive Count = 1
Person:
False Positive Count = 2
Location:
True Positive Count = 1
False Negative Count = 1
我目前得到的输出是:
Organization:
True Positive Count = 1
False Negative Count = 2
False Positive Count = 0
Person:
True Positive Count = 0
False Negative Count = 0
False Positive Count = 2
Location:
True Positive Count = 0
False Negative Count = 1
False Positive Count = 0
CODE
private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap)
{
List<Integer> compareResults = new ArrayList<Integer>();
if (!annotationMap.entrySet().containsAll(rageMap.entrySet())){
for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){
if (rageEntry.getKey().equals("ORGANIZATION") && !(annotationMap.containsKey(rageEntry.getKey()))){
for (int j = 0; j< rageEntry.getValue().size(); j++) {
orgFalsePositiveCount++;
}
}
if (rageEntry.getKey().equals("PERSON") && !(annotationMap.containsKey(rageEntry.getKey()))){
// System.out.println(rageEntry.getKey());
// System.out.println(annotationMap.entrySet());
for (int j = 0; j< rageEntry.getValue().size(); j++) {
perFalsePositiveCount++;
}
}
if (rageEntry.getKey().equals("LOCATION") && !(annotationMap.containsKey(rageEntry.getKey()))){
for (int j = 0; j< rageEntry.getValue().size(); j++) {
locFalsePositiveCount++;
}
}
}
}
for (Entry<String, List<String>> entry : annotationMap.entrySet()){
int i_index = 0;
if (rageMap.entrySet().isEmpty()){
orgFalseNegativeCount++;
continue;
}
// for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){
if (entry.getKey().equals("ORGANIZATION")){
for(String val : entry.getValue()) {
if (rageMap.get(entry.getKey()) == null){
orgFalseNegativeCount++;
continue;
}
recusion: for (int i = i_index; i< rageMap.get(entry.getKey()).size();){
String rageVal = rageMap.get(entry.getKey()).get(i);
if(val.equals(rageVal)){
orgTruePositiveCount++;
i_index++;
break recusion;
}
else if((val.length() > rageVal.length()) && val.contains(rageVal)){ //|| dataB.get(entryA.getKey()).contains(entryA.getValue())){
orgTruePositiveCount++;
i_index++;
break recusion;
}
else if((val.length() < rageVal.length()) && rageVal.contains(val)){
orgTruePositiveCount++;
i_index++;
break recusion;
}
else if(!val.contains(rageVal)){
orgFalseNegativeCount++;
i_index++;
break recusion;
}
else if(!rageVal.contains(val)){
orgFalsePositiveCount++;
i_index++;
break recusion;
}
}
}
}
......................... //(Same for person and location)
compareResults.add(orgTruePositiveCount);
compareResults.add(orgFalseNegativeCount);
compareResults.add(orgFalsePositiveCount);
compareResults.add(perTruePositiveCount);
compareResults.add(perFalseNegativeCount);
compareResults.add(perFalsePositiveCount);
compareResults.add(locTruePositiveCount);
compareResults.add(locFalseNegativeCount);
compareResults.add(locFalsePositiveCount);
System.out.println(compareResults);
return compareResults;
}
答案 0 :(得分:1)
如果我做对了这可能有帮助。
我创建了一个自定义字符串来覆盖部分匹配的等号
public class MyCustomString {
private String myString;
public MyCustomString(String myString) {
this.myString = myString;
}
public String getMyString() {
return myString;
}
public void setMyString(String myString) {
this.myString = myString;
}
@Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final MyCustomString other = (MyCustomString) obj;
if (!Objects.equals(this.myString, other.myString) && !other.myString.contains(this.myString)) {
return false;
}
return true;
}
// add getter and setter for myString
// or delegate needed methods to myString object.
@Override
public int hashCode() {
int hash = 3;
hash = 47 * hash + Objects.hashCode(this.myString);
return hash;
}
}
这里是我尝试使用地图第一部分的代码
LinkedHashMap<String, List<MyCustomString>> sampleMap1 = new LinkedHashMap<String, List<MyCustomString>>();
sampleMap1.put("a", new ArrayList<>());
sampleMap1.get("a").add(new MyCustomString("Fulton Tax Commissioner 's Office"));
sampleMap1.get("a").add(new MyCustomString("Grady Hospital"));
sampleMap1.get("a").add(new MyCustomString("Fulton Health Department"));
LinkedHashMap<String, List<MyCustomString>> sampleMap2 = new LinkedHashMap<String, List<MyCustomString>>();
sampleMap2.put("a", new ArrayList<>());
sampleMap2.get("a").add(new MyCustomString("Atlanta Police Department"));
sampleMap2.get("a").add(new MyCustomString("Fulton Tax Commissioner"));
sampleMap2.get("a").add(new MyCustomString("Fulton Health Department"));
HashMap<String, Integer> resultMap = new HashMap<String, Integer>();
for (Map.Entry<String, List<MyCustomString>> entry : sampleMap1.entrySet()) {
String key1 = entry.getKey();
List<MyCustomString> value1 = entry.getValue();
List<MyCustomString> singleListOfMap2 = sampleMap2.get(key1);
if (singleListOfMap2 == null) {
// all entry are false negative
System.out.println("Number of false N" + value1.size());
}
for (MyCustomString singleStringOfMap2 : singleListOfMap2) {
if (value1.contains(singleStringOfMap2)) {
//True positive
System.out.println("true");
} else {
//false negative
System.out.println("false N");
}
}
int size = singleListOfMap2.size();
System.out.println(size + " - numero di true");
//false positive = size - true
}
for (String string : sampleMap2.keySet()) {
if (sampleMap1.get(string) == null) {
//all these are false positive
System.out.println("numero di false P: " + sampleMap2.get(string).size());
}
}
答案 1 :(得分:1)
我写了这个类来比较地图:
public class MapComparison<K, V> {
private final Map<K, Collection<ValueCounter>> temp;
private final Map<K, Collection<V>> goldMap;
private final Map<K, Collection<V>> comparedMap;
private final BiPredicate<V, V> valueMatcher;
public MapComparison(Map<K, Collection<V>> mapA, Map<K, Collection<V>> mapB, BiPredicate<V, V> valueMatcher) {
this.goldMap = mapA;
this.comparedMap = mapB;
this.valueMatcher = valueMatcher;
this.temp = new HashMap<>();
goldMap.forEach((key, valueList) -> {
temp.put(key, valueList.stream().map(value -> new ValueCounter(value, true)).collect(Collectors.toList()));
});
comparedMap.entrySet().stream().forEach(entry -> {
K key = entry.getKey();
Collection<V> valueList = entry.getValue();
if(temp.containsKey(key)) {
Collection<ValueCounter> existingMatches = temp.get(key);
Stream<V> falsePositives = valueList.stream().filter(v -> existingMatches.stream().noneMatch(mv -> mv.match(v)));
falsePositives.forEach(fp -> existingMatches.add(new ValueCounter(fp, false)));
} else {
temp.putIfAbsent(key, valueList.stream().map(value -> new ValueCounter(value, false)).collect(Collectors.toList()));
}
});
}
public String formatMatchedCounters() {
StringBuilder sb = new StringBuilder();
for(Entry<K, Collection<ValueCounter>> e : temp.entrySet()) {
sb.append(e.getKey()).append(":");
int[] counters = e.getValue().stream().collect(() -> new int[3], (a, b) -> {
a[0] += b.truePositiveCount;
a[1] += b.falsePositiveCount;
a[2] += b.falseNegativeCount;
}, (c, d) -> {
c[0] += d[0];
c[1] += d[1];
c[2] += d[2];
});
sb.append(String.format("\ntruePositiveCount=%s\nfalsePositiveCount=%s\nfalseNegativeCount=%s\n\n", counters[0], counters[1], counters[2]));
}
return sb.toString();
}
private class ValueCounter {
private final V goldValue;
private int truePositiveCount = 0;
private int falsePositiveCount = 0;
private int falseNegativeCount = 0;
ValueCounter(V value, boolean isInGoldMap) {
this.goldValue = value;
if(isInGoldMap) {
falseNegativeCount = 1;
} else {
falsePositiveCount = 1;
}
}
boolean match(V otherValue) {
boolean result = valueMatcher.test(goldValue, otherValue);
if(result) {
truePositiveCount++;
falseNegativeCount = 0;
}
return result;
}
}
}
基本上是创建一个地图项的联合,每个项目都有自己的可变计数器来计算匹配值。方法formatMatchedCounters()
只是为每个键迭代并求和这些计数器。
以下测试:
public class MapComparisonTest {
private Map<String, Collection<String>> goldMap;
private Map<String, Collection<String>> comparedMap;
private BiPredicate<String, String> valueMatcher;
@Before
public void initMaps() {
goldMap = new HashMap<>();
goldMap.put("ORGANIZATION", Arrays.asList("Fulton Tax Commissioner", "Grady Hospital", "Fulton Health Department"));
goldMap.put("LOCATION", Arrays.asList("Bellwood", "Alpharetta"));
comparedMap = new HashMap<>();
comparedMap.put("ORGANIZATION", Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department"));
comparedMap.put("LOCATION", Arrays.asList("Alpharetta"));
comparedMap.put("PERSON", Arrays.asList("Bellwood", "Grady Hospital"));
valueMatcher = String::equalsIgnoreCase;
}
@Test
public void test() {
MapComparison<String, String> comparison = new MapComparison<>(goldMap, comparedMap, valueMatcher);
System.out.println(comparison.formatMatchedCounters());
}
}
的结果是:
ORGANIZATION:
truePositiveCount=2
falsePositiveCount=1
falseNegativeCount=1
LOCATION:
truePositiveCount=1
falsePositiveCount=0
falseNegativeCount=1
PERSON:
truePositiveCount=0
falsePositiveCount=2
falseNegativeCount=0
请注意,我不知道您希望如何比较类似的价值观(例如&#34; Fulton Tax Commissioner&#34; vs&#34; Fulton Tax Commissioner s&#34;),所以我决定将该决定放在签名中(在本例中为BiPredicate
作为参数)。
例如,可以使用Levenshtein distance:
实现字符串比较valueMatcher = (s1, s2) -> StringUtils.getLevenshteinDistance(s1, s2) < 5;
答案 2 :(得分:1)
我想出了一个简化版本。这是我得到的输出:
Organization:
False Positive: Atlanta Police Department
True Positive: Fulton Tax Commissioner
True Positive: Fulton Health Department
False Negative: Grady Hospital
Person:
False Positive: Bellwood
False Positive: Grady Hospital
Location:
True Positive: Alpharetta
False Negative: Bellwood
[2, 1, 1, 0, 0, 2, 1, 1, 0]
这是我创建的代码:
public class MapCompare {
public static boolean listContains(List<String> annotationList, String value) {
if(annotationList.contains(value)) {
// 100% Match
return true;
}
for(String s: annotationList) {
if (s.contains(value) || value.contains(s)) {
// Partial Match
return true;
}
}
return false;
}
public static List<Integer> compareLists(List<String> annotationList, List<String> rageList){
List<Integer> compareResults = new ArrayList<Integer>();
if(annotationList == null || rageList == null) return Arrays.asList(0, 0, 0);
Integer truePositiveCount = 0;
Integer falseNegativeCount = 0;
Integer falsePositiveCount = 0;
for(String r: rageList) {
if(listContains(annotationList, r)) {
System.out.println("\tTrue Positive: " + r);
truePositiveCount ++;
} else {
System.out.println("\tFalse Positive: " + r);
falsePositiveCount ++;
}
}
for(String s: annotationList) {
if(listContains(rageList, s) == false){
System.out.println("\tFalse Negative: " + s);
falseNegativeCount ++;
}
}
compareResults.add(truePositiveCount);
compareResults.add(falseNegativeCount);
compareResults.add(falsePositiveCount);
System.out.println();
return compareResults;
}
private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) {
List<Integer> compareResults = new ArrayList<Integer>();
System.out.println("Organization:");
compareResults.addAll(compareLists(annotationMap.get("ORGANIZATION"), rageMap.get("ORGANIZATION")));
System.out.println("Person:");
compareResults.addAll(compareLists(annotationMap.get("PERSON"), rageMap.get("PERSON")));
System.out.println("Location:");
compareResults.addAll(compareLists(annotationMap.get("LOCATION"), rageMap.get("LOCATION")));
System.out.println(compareResults);
return compareResults;
}
public static void main(String[] args) {
LinkedHashMap<String, List<String>> Map1 = new LinkedHashMap<>();
List<String> m1l1 = Arrays.asList("Fulton Tax Commissioner's Office", "Grady Hospital", "Fulton Health Department");
List<String> m1l2 = Arrays.asList("Bellwood", "Alpharetta");
List<String> m1l3 = Arrays.asList();
Map1.put("ORGANIZATION", m1l1);
Map1.put("LOCATION", m1l2);
Map1.put("PERSON", m1l3);
LinkedHashMap<String, List<String>> Map2 = new LinkedHashMap<>();
List<String> m2l1 = Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department");
List<String> m2l2 = Arrays.asList("Alpharetta");
List<String> m2l3 = Arrays.asList("Bellwood", "Grady Hospital");
Map2.put("ORGANIZATION", m2l1);
Map2.put("LOCATION", m2l2);
Map2.put("PERSON", m2l3);
compareMaps(Map1, Map2);
}
}
希望这有帮助!