我需要一些帮助来实现这些公式。我认为我正确实施了它们但由于某些原因我没有得到预期的结果:
这分别是NMI,I和H函数的代码。 公式是否正确实施? 感谢
int totalN = getTotalN(events);
double h1 = H(clusters, totalN);
double h2 = H(events, totalN);
double valueI = I(clusters, events, totalN);
double value_NMI = valueI / (double) ((h1 + h2) / (double) 2);
System.out.println("NMI: " + value_NMI);
static public double I(HashMap<String, ArrayList<String>> clusters, HashMap<String, ArrayList<String>> events, int totalN) {
//store sorted content to contents
Iterator<Map.Entry<String, ArrayList<String>>> it = events.entrySet().iterator();
Iterator<Map.Entry<String, ArrayList<String>>> it2 = clusters.entrySet().iterator();
String key;
ArrayList<String> event;
ArrayList<String> cluster;
double valueI = 0;
while (it.hasNext()) {
Map.Entry<String, ArrayList<String>> mapItem = it.next();
key = mapItem.getKey();
//if cluster doesn't exist
//if(!clusters.containsKey(key)) continue;
//cluster = clusters.get(key);
event = mapItem.getValue();
while (it2.hasNext()) {
Map.Entry<String, ArrayList<String>> mapItem2 = it2.next();
cluster = mapItem2.getValue();
float common_docs = 0;
for (int i=0; i< event.size(); i++) {
for (int j=0; j< cluster.size(); j++) {
if (event.get(i).equals(cluster.get(j))) {
common_docs = common_docs + 1;
break;
}
}
}
if (common_docs != 0) valueI = valueI + ( ( common_docs / (float) totalN) * Math.log((common_docs * totalN) / (float) (event.size() * cluster.size())) );
}
}
return valueI;
}
static public double H(HashMap<String, ArrayList<String>> clusters, int totalN) {
//store sorted content to contents
Iterator<Map.Entry<String, ArrayList<String>>> it = clusters.entrySet().iterator();
ArrayList<String> cluster;
double entropy = 0;
while (it.hasNext()) {
Map.Entry<String, ArrayList<String>> mapItem = it.next();
cluster = mapItem.getValue();
double ratio = cluster.size() / (float) totalN;
entropy = entropy + ratio * Math.log(ratio);
}
return -entropy;
}
static public int getTotalN(HashMap<String, ArrayList<String>> dataset) {
int totalN = 0;
Iterator<Map.Entry<String, ArrayList<String>>> it = dataset.entrySet().iterator();
ArrayList<String> item;
while (it.hasNext()) {
Map.Entry<String, ArrayList<String>> mapItem = it.next();
item = mapItem.getValue();
for (int i=0; i< item.size(); i++) {
totalN = totalN + 1;
}
}
return totalN ;
}
答案 0 :(得分:3)
我猜不是。我刚检查了I(C,E),并且在每次迭代中都没有重置it2,这对于嵌套总和来说是必要的。
答案 1 :(得分:1)
迭代器it2在方法I中应该在循环内部进行初始化。 您可以使用“foreach”表示法简化代码并避免此类错误:
static public double I(HashMap<String, ArrayList<String>> clusters, HashMap<String, ArrayList<String>> events, int totalN) {
String key;
ArrayList<String> event;
ArrayList<String> cluster;
double valueI = 0;
for (Map.Entry<String, ArrayList<String>> mapItem: events.entrySet()) {
key = mapItem.getKey();
//if cluster doesn't exist
//if(!clusters.containsKey(key)) continue;
//cluster = clusters.get(key);
event = mapItem.getValue();
for (Map.Entry<String, ArrayList<String>> mapItem2: clusters.entrySet()) {
cluster = mapItem2.getValue();
float common_docs = 0;
for (int i = 0; i < event.size(); i++) {
for (int j = 0; j < cluster.size(); j++) {
if (event.get(i).equals(cluster.get(j))) {
common_docs = common_docs + 1;
break;
}
}
}
if (common_docs != 0) {
valueI = valueI + ((common_docs / (float) totalN) * Math.log((common_docs * totalN) / (float) (event.size() * cluster.size())));
}
}
}
return valueI;
}
static public double H(HashMap<String, ArrayList<String>> clusters, int totalN) {
//store sorted content to contents
ArrayList<String> cluster;
double entropy = 0;
for (Map.Entry<String, ArrayList<String>> mapItem: clusters.entrySet()) {
cluster = mapItem.getValue();
double ratio = cluster.size() / (float) totalN;
entropy = entropy + ratio * Math.log(ratio);
}
return -entropy;
}
static public int getTotalN(HashMap<String, ArrayList<String>> dataset) {
int totalN = 0;
ArrayList<String> item;
for (Map.Entry<String, ArrayList<String>> mapItem: dataset.entrySet()) {
item = mapItem.getValue();
for (int i = 0; i < item.size(); i++) {
totalN = totalN + 1;
}
}
return totalN;
}
答案 2 :(得分:0)
我的猜测是,由于浮点舍入错误,您无法获得预期的结果(有关详细信息,请参阅this)。我没有看过实现这三个函数的方法中的代码,但是我发现你使用了float
和double
,这可能会给你带来麻烦。您可能希望改为使用BigDecimal
。