ggplot2 geom_hex w /两个数据集

时间:2013-06-20 14:23:47

标签: r ggplot2

我想制作食物中毒数据的六边形图。我可以使用ggplot2和geom_hex轻松完成这项工作......

ggplot(df) + geom_hex(aes(x=longitude, y=latitude))

请在此处查看代码... https://gist.github.com/corynissen/5823114

但是,这只会影响食物中毒的频率,这会产生误导,因为在有更多餐馆的地区会有更多的食物中毒报告。所以,我想使用餐馆许可证数据来规范化。

基本上,对于每个bin,我希望df的计数除以lic的计数(参见链接中的数据/代码)。

1 个答案:

答案 0 :(得分:0)

如果热图也可以接受,这是我的肮脏解决方案:

# we don't want missing values in lat or lon
lic <- subset(lic, !is.na(longitude) & !is.na(latitude))

# get the x and y ranges for the union of both data sets
xmin <- min(c(df$longitude, lic$longitude))
xmax <- max(c(df$longitude, lic$longitude))
ymin <- min(c(df$latitude,  lic$latitude))
ymax <- max(c(df$latitude,  lic$latitude))

# set the number of bins and get x and y break points
n_bins  <- 30
xbreaks <- seq(xmin, xmax, length=(n_bins+1))
ybreaks <- seq(ymin, ymax, length=(n_bins+1))

# get the 2d histogram of the food inspections set
v1 <- cut(df$longitude, breaks=xbreaks)  # creates a factor of length nrow(df)
v2 <- cut(df$latitude,  breaks=ybreaks)  # creates a factor of length nrow(df)
A1 <- as.numeric(table(v1,v2))           # of length n_bins*n_bins

# get the 2d histogram of the business licenses set
v1 <- cut(lic$longitude, breaks=xbreaks) # creates a factor of length nrow(lic)
v2 <- cut(lic$latitude,  breaks=ybreaks) # creates a factor of length nrow(lic)
A2 <- as.numeric(table(v1,v2))           # of length n_bins*n_bins

# let's normalize the data
A3 <- A1 / A2
A3[is.infinite(A3) | is.na(A3)] <- 0  # 2 values were infinite!?

# create the final data set in a very very dirty way...
df2 <- data.frame(longitude = rep(seq(xmin, xmax, length=(2*n_bins+1))[seq(2, (2*n_bins+1), by=2)], times=n_bins), latitude = rep(seq(ymin, ymax, length=(2*n_bins+1))[seq(2, (2*n_bins+1), by=2)], each=n_bins), count=A3)

# ...and visualize it
ggplot() +
   geom_tile(data=df2, mapping=aes(x=longitude, y=latitude, fill=count))