我正在尝试根据客户数据的空间位置对其进行聚类。这就是我所做的,
#Reading the data
theData <- read.csv("Customer_Segmentation/data.csv")
#Subsetting only long, lat and record id.
inputdata <- data.frame(long=theData$LONG, lat=theData$LAT, RecordID=theData$RecordID)
#Building distance matrix
library(fossil)
d = earth.dist(inputdata, dist = TRUE)
#Applying DBSCAN Clustering
library(fpc)
ds <- dbscan(d,eps = 0.5,MinPts = 50, method = "dist")
它给了我大约23个集群,
dbscan Pts=14873 MinPts=50 eps=0.5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
border 6546 73 47 38 20 53 60 27 70 19 93 43 58 25 21 31 36 492 47 44 41 43 55 35
seed 0 757 12 26 84 84 6 36 6 50 2132 70 2 101 91 55 104 2908 22 23 42 82 59 104
total 6546 830 59 64 104 137 66 63 76 69 2225 113 60 126 112 86 140 3400 69 67 83 125 114 139
第一个问题 - &gt;如何在地图中绘制这些聚类?如果有人给我指示一些示例代码来绘制聚类,那将是很棒的,我试图在新西兰地图中绘制这个。我试着下载坐标并转换如下,
library(sp)
library(rgdal)
nz1 <- getData("GADM", country = "NZ", level = 1)
nz1 <- spTransform(nz1, CRS = CRS("+init=epsg:2135"))
但是我的MAC中出现了这个错误,
Error in spTransform(nz1, CRS = CRS("+init=epsg:2135")) :
error in evaluating the argument 'CRSobj' in selecting a method for function 'spTransform': Error in CRS("+init=epsg:2135") : no system list, errno: 2
第二个问题,我在某处读到k-means不利于空间聚类,然后,我尝试使用层次聚类来聚类,但它产生了一个大的树形图,更密集的树形图,所以无法得到任何信息。所以选择DBSCAN来做到这一点。但是在这个中,我可以看到一些落在边境的点,结果表明。我确信每个群集中我需要大约50-70个客户。但我应该选择什么样的eps值?这是我的样本数据。
long lat RecordID
1 174.9066 -41.20867 90
2 174.9093 -41.22624 91
3 174.8893 -41.21618 92
4 174.8973 -41.21133 93
5 174.9153 -41.20419 94
6 174.9239 -41.20167 95
按要求更新会话信息,
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] raster_2.3-40 rgdal_0.9-2 sp_1.1-0
loaded via a namespace (and not attached):
[1] grid_3.1.2 lattice_0.20-29 tools_3.1.2
按要求更新了库(rgdal)输出,
library(rgdal)
rgdal: version: 0.9-2, (SVN revision 526)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
Path to GDAL shared files: /usr/local/share/epsg_csv
Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
Path to PROJ.4 shared files: (autodetected)
Warning message:
package ‘rgdal’ was built under R version 3.1.3
注意: - 我已经明确提到我正在尝试绘制空间聚类输出并查找选项,并且我的一个选项出错了。还有一个问题需要覆盖边界集群值。
答案 0 :(得分:0)
在我的机器上运行以下代码没有问题:
library(sp)
library(rgdal)
library(raster)
nz1 = getData("GADM", country = "NZ", level = 1)
nz1 = spTransform(nz1, CRS = CRS("+init=epsg:2135"))
这是我的sessionInfo()
:
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.10
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] raster_2.3-40 rgdal_0.9-2 sp_1.0-17 dplyr_0.4.1 pROC_1.8
[6] DBI_0.3.1
loaded via a namespace (and not attached):
[1] lazyeval_0.1.10 R6_2.0.1 plyr_1.8.1 magrittr_1.5
[5] assertthat_0.1 wakefield_0.2.0 parallel_3.2.0 tools_3.2.0
[9] Rcpp_0.11.4 grid_3.2.0 lattice_0.20-31
我很确定这与系统有关。我不使用地理空间数据,因此必须从头开始设置所有要求。
ppa:ubuntugis
设置最新版本的GDAL。 gdal-bin
,libgdal1-dev
&amp; libproj-dev
。raster
和rgdal
。 根据@ RobertH的建议,添加rgdal
包加载时间消息:
rgdal: version: 0.9-2, (SVN revision 526)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
Path to GDAL shared files: /usr/share/gdal/1.11
Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
Path to PROJ.4 shared files: (autodetected)