我想制作食物中毒数据的六边形图。我可以使用ggplot2和geom_hex轻松完成这项工作......
ggplot(df) + geom_hex(aes(x=longitude, y=latitude))
请在此处查看代码... https://gist.github.com/corynissen/5823114
但是,这只会影响食物中毒的频率,这会产生误导,因为在有更多餐馆的地区会有更多的食物中毒报告。所以,我想使用餐馆许可证数据来规范化。
基本上,对于每个bin,我希望df的计数除以lic的计数(参见链接中的数据/代码)。
答案 0 :(得分:0)
如果热图也可以接受,这是我的肮脏解决方案:
# we don't want missing values in lat or lon
lic <- subset(lic, !is.na(longitude) & !is.na(latitude))
# get the x and y ranges for the union of both data sets
xmin <- min(c(df$longitude, lic$longitude))
xmax <- max(c(df$longitude, lic$longitude))
ymin <- min(c(df$latitude, lic$latitude))
ymax <- max(c(df$latitude, lic$latitude))
# set the number of bins and get x and y break points
n_bins <- 30
xbreaks <- seq(xmin, xmax, length=(n_bins+1))
ybreaks <- seq(ymin, ymax, length=(n_bins+1))
# get the 2d histogram of the food inspections set
v1 <- cut(df$longitude, breaks=xbreaks) # creates a factor of length nrow(df)
v2 <- cut(df$latitude, breaks=ybreaks) # creates a factor of length nrow(df)
A1 <- as.numeric(table(v1,v2)) # of length n_bins*n_bins
# get the 2d histogram of the business licenses set
v1 <- cut(lic$longitude, breaks=xbreaks) # creates a factor of length nrow(lic)
v2 <- cut(lic$latitude, breaks=ybreaks) # creates a factor of length nrow(lic)
A2 <- as.numeric(table(v1,v2)) # of length n_bins*n_bins
# let's normalize the data
A3 <- A1 / A2
A3[is.infinite(A3) | is.na(A3)] <- 0 # 2 values were infinite!?
# create the final data set in a very very dirty way...
df2 <- data.frame(longitude = rep(seq(xmin, xmax, length=(2*n_bins+1))[seq(2, (2*n_bins+1), by=2)], times=n_bins), latitude = rep(seq(ymin, ymax, length=(2*n_bins+1))[seq(2, (2*n_bins+1), by=2)], each=n_bins), count=A3)
# ...and visualize it
ggplot() +
geom_tile(data=df2, mapping=aes(x=longitude, y=latitude, fill=count))