我需要一些关于这些公式实现的帮助

时间:2011-04-11 12:40:56

标签: java

我需要一些帮助来实现这些公式。我认为我正确实施了它们但由于某些原因我没有得到预期的结果:

enter image description here

这分别是NMI,I和H函数的代码。 公式是否正确实施? 感谢

int totalN = getTotalN(events);
        double h1 = H(clusters, totalN);
        double h2 = H(events, totalN);
        double valueI = I(clusters, events, totalN);
        double value_NMI = valueI / (double) ((h1 + h2) / (double) 2);
        System.out.println("NMI: " + value_NMI);

static public double I(HashMap<String, ArrayList<String>> clusters, HashMap<String, ArrayList<String>> events, int totalN) {

        //store sorted content to contents
        Iterator<Map.Entry<String, ArrayList<String>>> it = events.entrySet().iterator();
        Iterator<Map.Entry<String, ArrayList<String>>> it2 = clusters.entrySet().iterator();

        String key;
        ArrayList<String> event;
        ArrayList<String> cluster;

        double valueI = 0;
        while (it.hasNext()) {

            Map.Entry<String, ArrayList<String>> mapItem = it.next();
            key = mapItem.getKey();

            //if cluster doesn't exist
            //if(!clusters.containsKey(key)) continue;
            //cluster = clusters.get(key);

            event = mapItem.getValue();

            while (it2.hasNext()) {
                Map.Entry<String, ArrayList<String>> mapItem2 = it2.next();
                cluster = mapItem2.getValue();

            float common_docs = 0;
            for (int i=0; i< event.size(); i++) {   
                for (int j=0; j< cluster.size(); j++) { 
                    if (event.get(i).equals(cluster.get(j))) {
                        common_docs = common_docs + 1;
                        break;
                    }
                }
            }


            if (common_docs != 0) valueI = valueI + ( ( common_docs / (float) totalN) * Math.log((common_docs * totalN) / (float) (event.size() * cluster.size())) );       
            }
        }

        return valueI;
    }


    static public double H(HashMap<String, ArrayList<String>> clusters, int totalN) {

        //store sorted content to contents
        Iterator<Map.Entry<String, ArrayList<String>>> it = clusters.entrySet().iterator();
        ArrayList<String> cluster;

        double entropy = 0;
        while (it.hasNext()) {

            Map.Entry<String, ArrayList<String>> mapItem = it.next();
            cluster = mapItem.getValue();

            double ratio = cluster.size() / (float) totalN;
            entropy = entropy + ratio * Math.log(ratio);

        }

        return -entropy;
    }

    static public int getTotalN(HashMap<String, ArrayList<String>> dataset) {

        int totalN = 0;
        Iterator<Map.Entry<String, ArrayList<String>>> it = dataset.entrySet().iterator();
        ArrayList<String> item;

        while (it.hasNext()) {

            Map.Entry<String, ArrayList<String>> mapItem = it.next();
            item = mapItem.getValue();

            for (int i=0; i< item.size(); i++) {
                totalN = totalN + 1;
            }

        }

        return totalN ;
    }

3 个答案:

答案 0 :(得分:3)

我猜不是。我刚检查了I(C,E),并且在每次迭代中都没有重置it2,这对于嵌套总和来说是必要的。

答案 1 :(得分:1)

迭代器it2在方法I中应该在循环内部进行初始化。 您可以使用“foreach”表示法简化代码并避免此类错误:

static public double I(HashMap<String, ArrayList<String>> clusters, HashMap<String, ArrayList<String>> events, int totalN) {

    String key;
    ArrayList<String> event;
    ArrayList<String> cluster;

    double valueI = 0;
    for (Map.Entry<String, ArrayList<String>> mapItem: events.entrySet()) {
        key = mapItem.getKey();

        //if cluster doesn't exist
        //if(!clusters.containsKey(key)) continue;
        //cluster = clusters.get(key);

        event = mapItem.getValue();

        for (Map.Entry<String, ArrayList<String>> mapItem2: clusters.entrySet()) {
            cluster = mapItem2.getValue();

            float common_docs = 0;
            for (int i = 0; i < event.size(); i++) {
                for (int j = 0; j < cluster.size(); j++) {
                    if (event.get(i).equals(cluster.get(j))) {
                        common_docs = common_docs + 1;
                        break;
                    }
                }
            }


            if (common_docs != 0) {
                valueI = valueI + ((common_docs / (float) totalN) * Math.log((common_docs * totalN) / (float) (event.size() * cluster.size())));
            }
        }
    }

    return valueI;
}

static public double H(HashMap<String, ArrayList<String>> clusters, int totalN) {

    //store sorted content to contents
    ArrayList<String> cluster;

    double entropy = 0;
    for (Map.Entry<String, ArrayList<String>> mapItem: clusters.entrySet()) {
        cluster = mapItem.getValue();

        double ratio = cluster.size() / (float) totalN;
        entropy = entropy + ratio * Math.log(ratio);

    }

    return -entropy;
}

static public int getTotalN(HashMap<String, ArrayList<String>> dataset) {

    int totalN = 0;
    ArrayList<String> item;

    for (Map.Entry<String, ArrayList<String>> mapItem: dataset.entrySet()) {
        item = mapItem.getValue();

        for (int i = 0; i < item.size(); i++) {
            totalN = totalN + 1;
        }

    }

    return totalN;
}

答案 2 :(得分:0)

我的猜测是,由于浮点舍入错误,您无法获得预期的结果(有关详细信息,请参阅this)。我没有看过实现这三个函数的方法中的代码,但是我发现你使用了floatdouble,这可能会给你带来麻烦。您可能希望改为使用BigDecimal