基于值

时间:2017-07-26 07:15:54

标签: r data-visualization heatmap ggmap choropleth

我想使用以下数据点生成一个等值区域图:

  • 经度
  • 纬度
  • 价格

这是数据集 - https://www.dropbox.com/s/0s05cl34bko7ggm/sample_data.csv?dl=0

我希望地图能够显示价格较高的区域以及价格较低的区域。它应该看起来像这样(样本图像):

enter image description here

这是我的代码:

library(ggmap)

map <- get_map(location = "austin", zoom = 9)
data <- read.csv(file.choose(), stringsAsFactors = FALSE)
data$average_rate_per_night <- as.numeric(gsub("[\\$,]", "", 
data$average_rate_per_night))
ggmap(map, extent = "device") + 
stat_contour( data = data, geom="polygon", 
            aes( x = longitude, y = latitude, z = average_rate_per_night, 
fill = ..level.. ) ) +
scale_fill_continuous( name = "Price", low = "yellow", high = "red" )

我收到以下错误消息:

2: Computation failed in `stat_contour()`:
Contour requires single `z` at each combination of `x` and `y`. 

我非常感谢有关如何解决此问题的任何帮助或任何其他方法来生成此类型的热图。请注意,我对的价格感兴趣,而不是记录的密度。

2 个答案:

答案 0 :(得分:3)

如果您坚持使用轮廓方法,则需要为数据中的每个可能的x,y坐标组合提供一个值。为了实现这一点,我强烈建议对空间进行网格化,并为每个bin生成一些摘要统计信息。

我根据您提供的数据附上以下工作示例:

library(ggmap)
library(data.table)

map <- get_map(location = "austin", zoom = 12)
data <- setDT(read.csv(file.choose(), stringsAsFactors = FALSE))

# convert the rate from string into numbers
data[, average_rate_per_night := as.numeric(gsub(",", "", 
       substr(average_rate_per_night, 2, nchar(average_rate_per_night))))]

# generate bins for the x, y coordinates
xbreaks <- seq(floor(min(data$latitude)), ceiling(max(data$latitude)), by = 0.01)
ybreaks <- seq(floor(min(data$longitude)), ceiling(max(data$longitude)), by = 0.01)

# allocate the data points into the bins
data$latbin <- xbreaks[cut(data$latitude, breaks = xbreaks, labels=F)]
data$longbin <- ybreaks[cut(data$longitude, breaks = ybreaks, labels=F)]

# Summarise the data for each bin
datamat <- data[, list(average_rate_per_night = mean(average_rate_per_night)), 
                 by = c("latbin", "longbin")]

# Merge the summarised data with all possible x, y coordinate combinations to get 
# a value for every bin
datamat <- merge(setDT(expand.grid(latbin = xbreaks, longbin = ybreaks)), datamat, 
                 by = c("latbin", "longbin"), all.x = TRUE, all.y = FALSE)

# Fill up the empty bins 0 to smooth the contour plot
datamat[is.na(average_rate_per_night), ]$average_rate_per_night <- 0

# Plot the contours
ggmap(map, extent = "device") +
  stat_contour(data = datamat, aes(x = longbin, y = latbin, z = average_rate_per_night, 
               fill = ..level.., alpha = ..level..), geom = 'polygon', binwidth = 100) +
  scale_fill_gradient(name = "Price", low = "green", high = "red") +
  guides(alpha = FALSE)

enter image description here

然后您可以使用bin大小和轮廓 binwidth 来获得所需的结果,但您还可以在网格上应用平滑函数以获得更平滑的等高线图。

答案 1 :(得分:0)

您可以使用 stat_summary_2d() stat_summary_hex()函数来实现类似的结果。这些函数将数据分成多个(由x和y定义),然后根据给定的函数汇总每个bin的z值。在下面的示例中,我选择均值作为聚合函数,并且地图基本上显示每个箱中的平均价格。

注意:我需要正确处理你的average_rate_per_night变量,以便将其转换为数字(删除$符号和逗号)。

library(ggmap)
library(data.table)

map <- get_map(location = "austin", zoom = 12)
data <- setDT(read.csv(file.choose(), stringsAsFactors = FALSE))
data[, average_rate_per_night := as.numeric(gsub(",", "",
    substr(average_rate_per_night, 2, nchar(average_rate_per_night))))]

ggmap(map, extent = "device") +
    stat_summary_2d(data = data, aes(x = longitude, y = latitude, 
        z = average_rate_per_night), fun = mean, alpha = 0.6, bins = 30) +
    scale_fill_gradient(name = "Price", low = "green", high = "red") 

enter image description here