如何绘制DBSCAN群集R输出

时间:2015-05-09 09:07:31

标签: r plot cluster-analysis hierarchical-clustering dbscan

我正在尝试根据客户数据的空间位置对其进行聚类。这就是我所做的,

#Reading the data
theData <- read.csv("Customer_Segmentation/data.csv")

#Subsetting only long, lat and record id.
inputdata <- data.frame(long=theData$LONG, lat=theData$LAT, RecordID=theData$RecordID)

#Building distance matrix
library(fossil)
d = earth.dist(inputdata, dist = TRUE) 

#Applying DBSCAN Clustering
library(fpc)
ds <- dbscan(d,eps = 0.5,MinPts = 50, method = "dist")

它给了我大约23个集群,

dbscan Pts=14873 MinPts=50 eps=0.5
      0   1  2  3   4   5  6  7  8  9   10  11 12  13  14 15  16   17 18 19 20  21  22  23
border 6546  73 47 38  20  53 60 27 70 19   93  43 58  25  21 31  36  492 47 44 41  43  55  35
seed      0 757 12 26  84  84  6 36  6 50 2132  70  2 101  91 55 104 2908 22 23 42  82  59 104
total  6546 830 59 64 104 137 66 63 76 69 2225 113 60 126 112 86 140 3400 69 67 83 125 114 139

第一个问题 - &gt;如何在地图中绘制这些聚类?如果有人给我指示一些示例代码来绘制聚类,那将是很棒的,我试图在新西兰地图中绘制这个。我试着下载坐标并转换如下,

library(sp)
library(rgdal)
nz1 <- getData("GADM", country = "NZ", level = 1)
nz1 <- spTransform(nz1, CRS = CRS("+init=epsg:2135"))

但是我的MAC中出现了这个错误,

Error in spTransform(nz1, CRS = CRS("+init=epsg:2135")) : 
  error in evaluating the argument 'CRSobj' in selecting a method for function 'spTransform': Error in CRS("+init=epsg:2135") : no system list, errno: 2

第二个问题,我在某处读到k-means不利于空间聚类,然后,我尝试使用层次聚类来聚类,但它产生了一个大的树形图,更密集的树形图,所以无法得到任何信息。所以选择DBSCAN来做到这一点。但是在这个中,我可以看到一些落在边境的点,结果表明。我确信每个群集中我需要大约50-70个客户。但我应该选择什么样的eps值?这是我的样本数据。

      long       lat RecordID
1 174.9066 -41.20867       90 
2 174.9093 -41.22624       91 
3 174.8893 -41.21618       92 
4 174.8973 -41.21133       93
5 174.9153 -41.20419       94
6 174.9239 -41.20167       95 

按要求更新会话信息,

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] raster_2.3-40 rgdal_0.9-2   sp_1.1-0     

loaded via a namespace (and not attached):
[1] grid_3.1.2      lattice_0.20-29 tools_3.1.2   

按要求更新了库(rgdal)输出,

library(rgdal)
rgdal: version: 0.9-2, (SVN revision 526)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
Path to GDAL shared files: /usr/local/share/epsg_csv
Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
Path to PROJ.4 shared files: (autodetected)
Warning message:
package ‘rgdal’ was built under R version 3.1.3 

注意: - 我已经明确提到我正在尝试绘制空间聚类输出并查找选项,并且我的一个选项出错了。还有一个问题需要覆盖边界集群值。

1 个答案:

答案 0 :(得分:0)

在我的机器上运行以下代码没有问题:

library(sp)
library(rgdal)
library(raster)
nz1 = getData("GADM", country = "NZ", level = 1) 
nz1 = spTransform(nz1, CRS = CRS("+init=epsg:2135"))

这是我的sessionInfo()

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] raster_2.3-40 rgdal_0.9-2   sp_1.0-17     dplyr_0.4.1   pROC_1.8     
[6] DBI_0.3.1    

loaded via a namespace (and not attached):
 [1] lazyeval_0.1.10 R6_2.0.1        plyr_1.8.1      magrittr_1.5   
 [5] assertthat_0.1  wakefield_0.2.0 parallel_3.2.0  tools_3.2.0    
 [9] Rcpp_0.11.4     grid_3.2.0      lattice_0.20-31

我很确定这与系统有关。我不使用地理空间数据,因此必须从头开始设置所有要求。

  1. 根据建议here,从ppa:ubuntugis设置最新版本的GDAL。
  2. 然后我安装了gdal-binlibgdal1-dev&amp; libproj-dev
  3. 我安装了R个软件包rasterrgdal
  4. 编辑:

    根据@ RobertH的建议,添加rgdal包加载时间消息:

    rgdal: version: 0.9-2, (SVN revision 526)
    Geospatial Data Abstraction Library extensions to R successfully loaded
    Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
    Path to GDAL shared files: /usr/share/gdal/1.11
    Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
    Path to PROJ.4 shared files: (autodetected)