我想通过在R的“ks”库中使用“kde”函数对5维数据(x,y,z,时间,大小)进行核密度估计。在它的手册中它说它可以做内核1至6维数据的密度估计(手册第24页:http://cran.r-project.org/web/packages/ks/ks.pdf)。
我的问题是它说超过3个维度我需要指定eval.points。我不知道如何指定评估点,因为没有超过3个维度的示例。例如,如果我想在问题空间中生成常规3D序列数据并将其用作评估点,我该怎么办?
这是我的数据:
422.697323 164.19886 2.457419 8.083796636 0.83367586
423.008236 163.32434 0.5551326 37.58477455 0.893893903
204.733908 218.36365 1.9397874 37.88324312 0.912809449
203.963056 218.4808 0.3723791 43.21775903 0.926406005
100.727581 46.60876 1.4022341 49.41510519 0.782807523
453.335182 244.25521 1.6292517 51.73779175 0.903910803
134.909462 210.96333 2.2389119 53.13433521 0.896529401
135.300562 212.02055 0.6739541 67.55073745 0.748783521
258.237117 134.29735 2.1205291 76.34032587 0.735699304
341.305271 149.26953 3.718958 94.33975483 0.849509216
307.138925 59.60571 0.6311074 106.9636715 0.987923188
307.76875 58.91453 2.6496741 113.8515307 0.802115718
415.025535 217.17398 1.7155688 115.7464603 0.875580325
414.977687 216.73327 1.7107369 115.9776948 0.767143582
311.006135 173.24378 2.7819572 120.8079566 0.925380118
310.116929 174.28122 4.3318722 129.2648401 0.776528535
347.260911 37.34946 3.5155427 136.7851291 0.851787115
351.317624 33.65703 0.5806926 138.7349284 0.909723017
4.471892 59.42068 1.4062959 139.0543783 0.967270976
5.480223 59.72857 2.7326106 139.2114277 0.987787428
199.513023 21.53302 2.5163259 143.5895625 0.864164659
198.718031 23.50163 0.4801849 147.2280466 0.741587333
26.650517 35.2019 0.8246514 150.4876506 0.744788202
25.089379 90.47825 0.8700944 152.1944046 0.777252476
26.307439 88.41552 2.4422487 155.9090026 0.952215177
234.282901 236.11422 1.8115261 155.9658144 0.776284654
235.052948 236.77437 1.9644963 156.6900297 0.944285448
23.048202 98.6261 3.4573048 159.7700912 0.773057491
21.516695 98.05431 2.5029284 160.8202997 0.978779087
213.936324 151.87013 3.1042192 161.0612489 0.80499513
277.887935 197.25753 1.3659279 163.673142 0.758978575
277.239746 197.54001 2.2109361 166.2629868 0.775325157
这是我正在使用的代码:
library(ks)
library(rgl)
kern <- read.table(file.choose(), sep=",")
hat <- kde(kern)
它适用于最多3个维度,但是对于4维和5维,它表示:需要为超过3个维度指定eval.points。
另外,我想知道如何绘制这些内核?例如,使用z作为条件变量并在3D散点图中绘制x,y,时间,并对不同的大小范围使用不同的颜色
答案 0 :(得分:2)
和你一样,我最初无法找到一个有用的例子,而且文档并没有真正描述出预期的对象类型。对于你的5d数据集,我尝试设置一个5d网格的点,这些点是从每个维度的10,25,50,70和90百分位数构建的。我的数据集名为“dat”:
evpts <- do.call(expand.grid, lapply(dat, quantile, prob=c(0.1,.25,.5,.75,.9)) )
然后我将它传递给了kde函数,似乎满足了算法。这是否“正确”确实需要检查。没有保证。
> hat <- kde(dat, eval.points= evpts)
> str(hat)
List of 8
$ x : num [1:31, 1:5] 423 423 205 204 101 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "V1" "V2" "V3" "V4" ...
$ eval.points:'data.frame': 3125 obs. of 5 variables:
..$ V1: Named num [1:3125] 23 118 234 326 415 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
..$ V2: Named num [1:3125] 35.2 35.2 35.2 35.2 35.2 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V3: Named num [1:3125] 0.581 0.581 0.581 0.581 0.581 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V4: Named num [1:3125] 43.2 43.2 43.2 43.2 43.2 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V5: Named num [1:3125] 0.749 0.749 0.749 0.749 0.749 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:5] 5 5 5 5 5
.. .. ..- attr(*, "names")= chr [1:5] "V1" "V2" "V3" "V4" ...
.. ..$ dimnames:List of 5
.. .. ..$ V1: chr [1:5] "V1= 23.0482" "V1=117.8185" "V1=234.2829" "V1=326.1557" ...
.. .. ..$ V2: chr [1:5] "V2= 35.20190" "V2= 59.51319" "V2=149.26953" "V2=211.49194" ...
.. .. ..$ V3: chr [1:5] "V3=0.5806926" "V3=1.1180112" "V3=1.9397874" "V3=2.5830000" ...
.. .. ..$ V4: chr [1:5] "V4= 43.21776" "V4= 71.94553" "V4=129.26484" "V4=151.34103" ...
.. .. ..$ V5: chr [1:5] "V5=0.7487835" "V5=0.7764066" "V5=0.8517871" "V5=0.9190948" ...
$ estimate : Named num [1:3125] 3.23e-08 5.70e-08 1.01e-08 4.07e-10 6.20e-12 ...
..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
$ H : num [1:5, 1:5] 5073.879 1010.815 1.211 -651.089 -0.223 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:5] "V1" "V2" "V3" "V4" ...
$ w : num [1:31] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "kde"
我确实找到了一个早期版本的软件包文档,它提供了这个4d执行的工作示例,我认为我的努力基本相同,模数不同:
data(iris)
ir <- iris[,1:4][iris[,5]=="setosa",]
H.scv <- Hscv(ir)
fhat <- kde(ir, H.scv, eval.points=ir)