Question

我正在尝试使用apache.commons.math3.ml.clustering中的DBSCANClusterer。函数集群返回集群列表但对我来说列表的大小始终为0.我做错了什么？以下是我的测试代码：

public class ClusterTest {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        DBSCANClusterer dbscan = new DBSCANClusterer(.05, 15);
        List<DoublePoint> points = getData();
        List<Cluster<DoublePoint>> cluster = dbscan.cluster(points);
        for(Cluster<DoublePoint> p : cluster)
            System.out.println(p.getPoints().toString());                             
    }

    private static List<DoublePoint> getData() throws FileNotFoundException, IOException {
        List<DoublePoint> data = new ArrayList<DoublePoint>();      
        BufferedReader reader = new BufferedReader(new FileReader(new File("clust.txt")));
        String line;
        double[] d = new double[2];
        while ((line = reader.readLine()) != null) {
            try {                   
                String[] l = line.split("\t");
                d[0] = Double.parseDouble(l[0]);
                d[1] = Double.parseDouble(l[1]);
                data.add(new DoublePoint(d));
            } catch (Exception e) { }
        }       
        return data;
    }
}

文件clust.txt包含两列，其中X和Y值用制表符分隔。我尝试了几个不同的数据，我总是得到0.

Answer 1

请尝试使用ELKI中的版本。不幸的是，Apache公共数学并不是很好。由于各种小问题，我离开了公共数学。 ELKI对我来说效果更好。

从快速的角度来看，在集群分析方面，公共数学仍然很糟糕...... MATH-917最后一次触及它。那里的DBSCAN代码仍然效率很低。在之前的版本中，DBSCAN使用了所有已弃用的类。但它已经收到了超过x年的4次提交。

如果你没有得到任何集群，你可能有一个太小的epsilon和一个太高的minPts值......并且 DBSCAN的emons-math实现失去所有噪音对象 - 这是你可能得到的：所有噪音。

Apache DBSCANClusterer始终返回0个集群

1 个答案: