地图,ggplot2,按州填写缺少地图上的某些区域

时间:2014-10-24 03:20:39

标签: r plot ggplot2 maps

我正与mapsggplot2合作,想象不同年份每个州的某些罪行的数量。我正在使用的数据集是由FBI制作的,可以从他们的网站或here下载(如果你不想下载数据集,我不会责备你,但是复制并粘贴到这个问题中太大了,包括一小部分数据集也无济于事,因为没有足够的信息来重新创建图表。

问题比描述的更容易看出。

Robbery by state

正如你所看到的,加利福尼亚州缺少一大块以及其他一些州。以下是生成此图的代码:

# load libraries
library(maps)
library(ggplot2)

# load data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
states <- map_data("state")

# merge data sets by region
fbi$region <- tolower(fbi$state)
fbimap <- merge(fbi, states, by="region")

# plot robbery numbers by state for year 2012
fbimap12 <- subset(fbimap, Year == 2012)
qplot(long, lat, geom="polygon", data=fbimap12,
  facets=~Year, fill=Robbery, group=group)

这是states数据的样子:

    long      lat     group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>

这就是fbi数据的样子:

    Year Population Violent Property Murder Forcible.Rape Robbery
1 1960    3266740    6097    33823    406           281     898
2 1961    3302000    5564    32541    427           252     630
3 1962    3358000    5283    35829    316           218     754
4 1963    3347000    6115    38521    340           192     828
5 1964    3407000    7260    46290    316           397     992
6 1965    3462000    6916    48215    395           367     992
   Aggravated.Assault Burglary Larceny.Theft Vehicle.Theft abbr   state region
1               4512    11626         19344          2853   AL Alabama  alabama
2               4255    11205         18801          2535   AL Alabama  alabama
3               3995    11722         21306          2801   AL Alabama  alabama
4               4755    12614         22874          3033   AL Alabama  alabama
5               5555    15898         26713          3679   AL Alabama  alabama
6               5162    16398         28115          3702   AL Alabama  alabama

然后我合并了region的两个集合。我试图绘制的子集是

      region Year Robbery      long      lat group
8283 alabama 2012    5020 -87.46201 30.38968     1
8284 alabama 2012    5020 -87.48493 30.37249     1
8285 alabama 2012    5020 -87.95475 30.24644     1
8286 alabama 2012    5020 -88.00632 30.24071     1
8287 alabama 2012    5020 -88.01778 30.25217     1
8288 alabama 2012    5020 -87.52503 30.37249     1
       ...            ...    ...      ...

关于如何在没有那些丑陋的缺失点的情况下创建这个情节的任何想法?

2 个答案:

答案 0 :(得分:8)

我玩了你的代码。我能说的一件事是当你使用merge时发生的事情。我使用geom_path绘制了状态图,并确认原始地图数据中不存在一些奇怪的线。然后,我通过玩mergeinner_join进一步调查此案例。 mergeinner_join在这里做同样的工作。但是,我发现了一个区别。当我使用merge时,订单已更改;数字不是正确的顺序。 inner_join不是这种情况。您将在下面看到加利福尼亚州的一些数据。你的方法是对的。但merge不知何故对你不利。不过,我不确定为什么函数改变了顺序。

library(dplyr)

### Call US map polygon
states <- map_data("state")

### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)


### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.

setdiff(fbi$state, states$region)
#[1] "alaska"           "hawaii"           "washington d. c."

setdiff(states$region, fbi$state)
#[1] "district of columbia"

### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
        filter(Year == 2012) %>%
        select(state, Robbery)  

现在,我使用mergeinner_join创建了两个数据框。

### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))

ana %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -119.8685 38.90956     4   676      <NA>
#2  california   56521 -119.5706 38.69757     4   677      <NA>
#3  california   56521 -119.3299 38.53141     4   678      <NA>
#4  california   56521 -120.0060 42.00927     4   667      <NA>
#5  california   56521 -120.0060 41.20139     4   668      <NA>

bob %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -120.0060 42.00927     4   667      <NA>
#2  california   56521 -120.0060 41.20139     4   668      <NA>
#3  california   56521 -120.0060 39.70024     4   669      <NA>
#4  california   56521 -119.9946 39.44241     4   670      <NA>
#5  california   56521 -120.0060 39.31636     4   671      <NA>

ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()

enter image description here

答案 1 :(得分:1)

问题在于合并

的参数顺序
fbimap <- merge(fbi, states, by="region")

首先是主题数据,然后是地理数据。使用

切换订单
fbimap <- merge(states, fbi, by="region")

多边形应该全部关闭。