Question

我在这里的问题中提到了一个距离矩阵：

Clustering with a distance matrix

现在，我想使用apache中的DBSCANclusterer.java类在此矩阵上执行DBSCAN。

方法'cluster'作为输入，一组点。这些要点的格式是什么？

参考上面的矩阵，我将什么添加到集合参数？

有人可以粘贴代码段吗？我想将距离指定为：

A，B：20 A，C：20 。。

然后当我完成聚类时，类似的样本应该聚集在一起。

Answer 1

希望这有帮助。

public class App {

public static void main(String[] args) throws FileNotFoundException, IOException {
    File[] files = getFiles("./files2/");

    DBSCANClusterer dbscan = new DBSCANClusterer(.05, 50);
    List<Cluster<DoublePoint>> cluster = dbscan.cluster(getGPS(files));

    for(Cluster<DoublePoint> c: cluster){
        System.out.println(c.getPoints().get(0));
    }                       
}

private static File[] getFiles(String args) {
    return new File(args).listFiles();
}

private static List<DoublePoint> getGPS(File[] files) throws FileNotFoundException, IOException {

    List<DoublePoint> points = new ArrayList<DoublePoint>();
    for (File f : files) {
        BufferedReader in = new BufferedReader(new FileReader(f));
        String line;

        while ((line = in.readLine()) != null) {
            try {
                double[] d = new double[2];
                d[0] = Double.parseDouble(line.split(",")[1]);
                d[1] = Double.parseDouble(line.split(",")[2]);
                points.add(new DoublePoint(d));
            } catch (ArrayIndexOutOfBoundsException e) {
            } catch(NumberFormatException e){
            }
        }
    }
    return points;
}
}

示例数据：

12-01-99 11:31:01 AM, -40.010, -70.020
12-01-99 11:32:01 AM, -41.010, -71.020
12-01-99 11:33:01 AM, -42.010, -72.020
12-01-99 11:34:01 AM, -43.010, -73.020
12-01-99 11:35:01 AM, -40.010, -74.020

将名为files2的文件夹中的所有文件放在getFiles方法中声明的位置。

如何使用apache的DBSCANClusterer

1 个答案: