汇总ggplot用法的纬度,经度和计数数据

时间:2014-07-06 21:34:35

标签: r google-maps ggplot2 heatmap tapply

我以纬度,经度和计数格式提供了一些客户数据。我需要创建ggplot热图所需的所有数据,但我不知道如何将其放入ggplot所需的格式中。

我试图通过0.01 Lat和0.01 Lon块(典型热图)中的总计数来聚合数据,我本能地认为“tapply”。这会根据需要按块大小创建一个很好的摘要,但格式错误。此外,我真的希望将空的Lat或Lon块值包含为零,即使没有任何内容......否则热图最终会看起来条纹奇怪。

非常感谢您的帮助。

我已经在下面的代码中创建了我的数据子集供您参考:

# m is the matrix of data provided
m = matrix(c(44.9591051,44.984884,44.984884,44.9811399,
           44.9969096,44.990894,44.9797023,44.983334,
          -93.3120017,-93.297668,-93.297668,-93.2993524,
          -93.2924484,-93.282462,-93.2738911,-93.26667,
          69,147,137,22,68,198,35,138), nrow=8, ncol=3) 
colnames(m) <- c("Lat", "Lon", "Count")
m <- as.data.frame(m)
s = as.data.frame((tapply(m$Count, list(round(m$Lon,2), round(m$Lat,2)), sum)))
s[is.na(s)] <- 0

# Data frame "s" has all the data, but not exactly in the format desired...
# First, it has a column for each latitude, instead of one column for Lon
# and one for Lat, and second, it needs to have 0 as the entry data for 
# Lat / Lon pairs that have no other data. As it is, there are only zeroes
# when one of the other entries has a Lat or Lon that matches... if there
# are no entries for a particular Lat or Lon value, then nothing at all is
# reported.

desired.format = matrix(c(44.96,44.96,44.96,44.96,44.96,
    44.97,44.97,44.97,44.97,44.97,44.98,44.98,44.98,
    44.98,44.98,44.99,44.99,44.99,44.99,44.99,45,45,
    45,45,45,-93.31,-93.3,-93.29,-93.28,-93.27,-93.31,
    -93.3,-93.29,-93.28,-93.27,-93.31,-93.3,-93.29,
    -93.28,-93.27,-93.31,-93.3,-93.29,-93.28,-93.27,
    -93.31,-93.3,-93.29,-93.28,-93.27,69,0,0,0,0,0,0,
    0,0,0,0,306,0,0,173,0,0,0,198,0,0,0,68,0,0),
    nrow=25, ncol=3)

colnames(desired.format) <- c("Lat", "Lon", "Count")
desired.format <- as.data.frame(desired.format)

minneapolis = get_map(location = "minneapolis, mn", zoom = 12)
ggmap(minneapolis) + geom_tile(data = desired.format, aes(x = Lon, y = Lat, alpha = Count), fill="red")

1 个答案:

答案 0 :(得分:3)

这是使用geom_hex和stat_density2d进行的攻击。通过截断坐标来制作箱子的想法让我有些不安。

你所拥有的是计数数据,给出lat / longs,这意味着理想情况下你需要一个权重参数,但据我所知,这并不是用geom_hex实现的。相反,我们通过重复计数变量的行来破解它,类似于方法here

  ## hack job to repeat records to full count
  m<-as.data.frame(m)
  m_long <- with(m, m[rep(1:nrow(m), Count),])


  ## stat_density2d
  ggplot(m_long, aes(Lat, Lon)) + 
  stat_density2d(aes(alpha=..level.., fill=..level..), size=2, 
                 bins=10, geom=c("polygon","contour")) + 
  scale_fill_gradient(low = "blue", high = "red") +
  geom_density2d(colour="black", bins=10) +
  geom_point(data = m_long)


  ## geom_hex alternative
  bins=6
  ggplot(m_long, aes(Lat, Lon)) + 
  geom_hex(bins=bins)+
  coord_equal(ratio = 1/1)+
  scale_fill_gradient(low = "blue", high = "red") +
  geom_point(data = m_long,position = "jitter")+
  stat_binhex(aes(label=..count..,size=..count..*.5), size=3.5,geom="text", bins=bins, colour="white")

这些分别产生以下内容: enter image description here 和分档版本: enter image description here

编辑:

使用底图:

map + 
  stat_density2d(data = m_long, aes(x = Lon, y = Lat,
alpha=..level.., fill=..level..), 
                 size=2, 
                 bins=10, 
                 geom=c("polygon","contour"),
                 inherit.aes=FALSE) + 
  scale_fill_gradient(low = "blue", high = "red") +
  geom_density2d(data = m_long, aes(x = Lon, y=Lat),
                 colour="black", bins=10,inherit.aes=FALSE) +
  geom_point(data = m_long, aes(x = Lon, y=Lat),inherit.aes=FALSE)


## and the hexbin map...

map + #ggplot(m_long, aes(Lat, Lon)) + 
  geom_hex(bins=bins,data = m_long, aes(x = Lon, y = Lat),alpha=.5,
                 inherit.aes=FALSE) + 
  geom_point(data = m_long, aes(x = Lon, y=Lat),
             inherit.aes=FALSE,position = "jitter")+
  scale_fill_gradient(low = "blue", high = "red")

enter image description here enter image description here