我正在尝试在ggplot2中的散点图上生成一个二维密度图。
我有以下工作代码:
plt<-ggplot(data=for_plot,aes(x=X, y=Y))+
stat_density2d(aes(fill=..level..,alpha=..level..),geom='polygon',colour='black') +
scale_fill_continuous(low="green",high="red") +
guides(alpha="none") +
ylim(0.5,max(shortest_path_list$shortest_path)) +
geom_point()
当我使用此数据集运行代码时:
> for_plot[sample(nrow(for_plot), 20), ]
Y X
1: 2 110182.549
2: 3 95202.283
3: 2 91557.371
4: 1 6730.598
5: 1 7396.081
6: 1 13939.701
7: 2 9767.561
8: 3 101597.449
9: 2 99368.467
10: 3 102024.722
11: 3 90491.076
12: 3 81337.624
13: 1 5956.710
14: 3 95160.149
15: 3 89981.055
16: 1 8823.615
17: 1 10717.879
18: 2 11463.036
19: 2 3864.292
20: 2 10351.874
请注意,我的Y是离散的,X是连续的,所以情节很好。
但是,当我使用此数据集时:
> for_plot[sample(nrow(for_plot), 20), ]
Y X
1: 1 9897.476
2: 2 2350.191
3: 1 13911.780
4: 1 98885.336
5: 1 94776.873
6: 1 102804.832
7: 1 99956.988
8: 1 13941.653
9: 1 9246.795
10: 1 13152.775
11: 1 113325.680
12: 1 82263.657
13: 1 91108.347
14: 1 8823.797
15: 1 11057.255
16: 1 99150.825
17: 2 7312.730
18: 2 6476.152
19: 1 113534.588
20: 1 91311.834
我收到以下错误和情节:
Warning message:
Computation failed in `stat_density2d()`:
bandwidths must be strictly positive
我知道导致此错误的方法之一通常是X方向或Y方向没有变化。但是,在这种情况下,似乎存在类似于第一种情况的变化。因此,我不理解是什么让第一个场景发挥作用,但第二个场景失败了。是否有解决第二种情况下轮廓的工作?
以下是Flick先生建议的具有最小可重复性示例的2个场景:
情景1(情节有效):
set.seed(100)
> for_plot<-dput(for_plot[sample(nrow(for_plot), 20), ])
structure(list(Y = c(2, 2, 3, 1, 2,
3, 3, 3, 2, 1, 3, 2, 2, 3, 1, 3, 2, 3, 2, 1), X = c(96649.7975713206,
104758.02495167, 93351.5907987183, 5535.8146932624, 99480.6016841293,
113103.505637801, 90445.3465777551, 81903.811792781, 106832.148472597,
6576.45291001145, 99368.9134426028, 111130.390217174, 9471.82883910966,
102087.415882298, 5657.05900168211, 107688.549964059, 103669.855375872,
94121.8586312176, 1573.00051813297, 7394.05750749363)), .Names = c("Y", "X"), class = c("data.table",
"data.frame"), row.names = c(NA, -20L), .internal.selfref = <pointer: 0x00000000065c0788>)
场景2(图表未产生所需的输出):
> for_plot<-dput(for_plot[sample(nrow(for_plot), 20), ])
structure(list(Y = c(1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2),
X = c(96925.0119740431, 98869.1560687514, 99434.7995468473,
9123.65901167288, 111471.920587976, 109448.280478224, 6678.04323546572,
98309.4525934759, 91311.834287723, 86616.727265815, 101009.644050382,
7396.08053430818, 102517.086739334, 11504.3148787722, 9471.82883910966,
15427.4786153589, 96385.4989659007, 2249.38197350042, 91425.5491534976,
9303.7114788096)), .Names = c("Y",
"X"), class = c("data.table", "data.frame"), row.names = c(NA,
-20L), .internal.selfref = <pointer: 0x00000000065c0788>)
错误:
Warning message:
Computation failed in `stat_density2d()`:
bandwidths must be strictly positive
更新
让内核工作的一种方法是在Y变量中添加一些随机噪声,使方差不再为0.
#Add variability for kernel density
rand_noise<-runif(nrow(for_plot), -0.1, 0.1)
for_plot$Y_noise<-for_plot$Y+rand_noise
虽然错误消失并且内核已经生成,但它们并不像场景1那样漂亮和统一:
正如我在评论中提到的,真正令我困惑的是为什么我默认情况下总是工作,而方案2默认情况下从不起作用。我已尝试使用不同的数据子集来验证这一点。方案1和方案2中的数据性质相同。