错误的地区填写州地图

时间:2018-03-15 19:47:25

标签: r ggplot2 shapefile rgdal

我在德克萨斯州有一个学区的shapfile,我正在尝试使用ggplot2突出显示10。我已经修改了它并完成了所有设置,但是当我发现它时,我意识到突出显示的10个区域实际上并不是我想要突出显示的区域。

shapefile可以从此链接下载到Texas Education Agency Public Open Data Site

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex@data$NAME2), as.character(tex@data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]

# turn shapefile into df
tex_df <- fortify(tex)

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))


# plot the graph
ggplot(data=tex_df) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none")

正如您所看到的,当情节被创建时,看起来它完全符合我的要求。问题是,在上面的districts向量中突出显示的十个区域并不是那些区域。我已经多次重新运行所有内容,仔细检查我没有因素/字符转换问题,并在Web数据资源管理器中仔细检查我从shapefile获取的ID确实是那些应该匹配的ID用我的名单。我真的不知道这个问题可能来自哪里。

这是我第一次使用shapefile和rgdal,所以如果我不得不猜测我不理解的结构有一些简单的东西,希望你们中的一个能够快速为我指出。谢谢!

这是输出:

enter image description here

2 个答案:

答案 0 :(得分:1)

备选方案1

使用fortify函数添加参数region指定&#34; NAME2&#34;,列ID将包含您的区域名称。然后根据该列创建虚拟填充变量。 我不熟悉德克萨斯州的区域,但我认为结果是对的。

tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# turn shapefile into df
tex_df <- fortify(tex, region = "NAME2")

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))

# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

enter image description here

备选方案2

不将参数区域传递给fortify函数。解决seeellayewhy的问题,实施以前的替代方案。我们添加了两个层,无需创建虚拟变量或合并任何数据帧。

tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

 # Subset the shape file into two
tex1 <- subset(tex, NAME2 %in% districts)
tex2 <- subset(tex, !(NAME2 %in% districts)) 

# Create two data frames
tex_df1 <- fortify(tex1)
tex_df2 <- fortify(tex2)

# Plot two geom_polygon layers, one for each data frame
ggplot() +
  geom_polygon(data = tex_df1, 
               aes(x = long, y = lat, group = group, fill = "#CCCCCC"), 
               color = "#CCCCCC")+
  geom_polygon(data = tex_df2, 
               aes(x = long, y = lat, group = group, fill ="#003082")) + 
    scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none") 

答案 1 :(得分:0)

当试图实现@mpalanco将{region}参数添加到fortify()函数的解决方案时,我得到了一个错误,我可以通过许多其他堆栈帖子(Error: isTRUE(gpclibPermitStatus()) is not TRUE)来解决。我也尝试使用broom::tidy(),它是fortify()的非弃用的等价物并且具有相同的错误。

最终,我最终从here实施了@ luchanocho的解决方案。我不喜欢它使用seq()生成ID的事实,因为它不一定保留正确的顺序,但我的情况很简单,我可以通过每个区域并确认正确的突出显示。

我的代码如下。输出与@ mpalanco的答案相同。因为他显然得到了正确的结果,并且使用了实现解决方案的方式并没有那么不稳定,所以我会给他答案,假设它有效。如果其他人遇到同样的错误,下面的解决方案可以被认为是一种解决方法。

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")


# convert shapefile to a df
tex_df <- fortify(tex)

# generate temp df with IDs to merge back in
names_df <- data.frame(tex@data$NAME2)
names(names_df) <- "NAME2"
names_df$id <- seq(0, nrow(names_df)-1)  # this is the part I felt was sketchy
final <- merge(tex_df, names_df, by="id")

# dummy out districts of interest
final$yes <- as.factor(ifelse(final$NAME2 %in% districts, 1, 0))


ggplot(data=final) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none")