根据数据框,当值匹配时,将值添加到其他数据框

时间:2020-03-27 00:21:23

标签: r data-science choropleth

这很难解释,但是基本上我有一个非常简单的带有县和病例的数据框

dat <- "County   Cases
1       Borden   5
2       Bosque   3
3       Bowue    1"

并且我有一个来自TEX <- map_data('county', 'texas')的大型数据框。

> head(TEX)
       long      lat group order region subregion
1 -95.75271 31.53560     1     1  texas  anderson
2 -95.76989 31.55852     1     2  texas  anderson
3 -95.76416 31.58143     1     3  texas  anderson
4 -95.72979 31.58143     1     4  texas  anderson
5 -95.74698 31.61008     1     5  texas  anderson
6 -95.72405 31.63873     1     6  texas  anderson

我想做的是检查每一行,如果子区域位于数据框 dat 中,然后将相应数量的案例添加到 TEX 中名为“案例”;否则请添加0。

例如

> head(TEX)
       long      lat group order region subregion cases
1 -95.75271 31.53560     1     1  texas  anderson 0
2 -95.76989 31.55852     1     2  texas  anderson 0
3 -95.76416 31.58143     1     3  texas  anderson 0
4 -95.72979 31.58143     1     4  texas  anderson 0
5 -95.74698 31.61008     1     5  texas  Borden   5
6 -95.72405 31.63873     1     6  texas  Bosque   3

我尝试用这段代码来完成

for (val in counties$counties) {
     for (vall in TEX$subregion) {
         if (val == vall) TEX$cases = counties$cases
     }
}

但是我得到这个错误

Error in `$<-.data.frame`(`*tmp*`, "cases", value = c(5L, 3L, 2L, 1L,  : 
  replacement has 10 rows, data has 4488

我在这里的最终目标是,根据我不断增长的县和案例列表,创建一个拥有COVID案例的德克萨斯州的州长。如果您有比我更好的方法,那么我会接受的!

致谢!

更新:Ian的解决方案效果很好,但是却导致ggplot和映射出现问题。如果我在合并之前先取下数据框TEX的一部分,则它看起来像这样

6   -96.81268   28.28693    4   76  texas   aransas
77  -96.80695   28.25828    4   77  texas   aransas
78  -96.82414   28.21817    4   78  texas   aransas
79  -96.87570   28.19525    4   79  texas   aransas
80  -96.91009   28.16660    4   80  texas   aransas
81  -96.94446   28.14942    4   81  texas   aransas
82  -96.94446   28.18379    4   82  texas   aransas
83  -96.92727   28.24109    4   83  texas   aransas
84  -96.92154   28.26974    4   84  texas   aransas
85  -96.94446   28.27547    4   85  texas   aransas
86  -96.99030   28.25255    4   86  texas   aransas
87  -96.98457   28.23536    4   87  texas   aransas
88  -96.97311   28.21817    4   88  texas   aransas
89  -96.96165   28.19525    4   89  texas   aransas
90  -96.97311   28.17233    4   90  texas   aransas
91  -97.00175   28.15515    4   91  texas   aransas
92  -97.03613   28.15515    4   92  texas   aransas
93  -97.04186   28.17233    4   93  texas   aransas
94  -97.03613   28.20098    4   94  texas   aransas
95  -97.05905   28.21817    4   95  texas   aransas
96  -97.07624   28.20671    4   96  texas   aransas
97  -97.11062   28.21817    4   97  texas   aransas
98  -97.12780   28.23536    4   98  texas   aransas
99  -97.12780   28.25255    4   99  texas   aransas
100 -97.11062   28.26401    4   100 texas   aransas
101 -97.01894   28.27547    4   101 texas   aransas
102 -96.80122   28.31557    4   102 texas   aransas

以及绘制后

ggplot(TEX, aes(long,lat, group = group)) + geom_polygon(aes(fill = subregion),color = "black") + theme(legend.position = "none") + coord_quickmap()

what

太棒了!现在,当我执行合并功能时,TEX将重新排列

72  aransas -97.00175   28.15515    4   91  texas   1
73  aransas -97.04186   28.17233    4   93  texas   1
74  aransas -96.80695   28.25828    4   77  texas   1
75  aransas -96.80122   28.31557    4   102 texas   1
76  aransas -97.03613   28.15515    4   92  texas   1
77  aransas -96.81268   28.28693    4   76  texas   1
78  aransas -97.12780   28.25255    4   99  texas   1
79  aransas -97.11062   28.26401    4   100 texas   1
80  aransas -96.97311   28.17233    4   90  texas   1
81  aransas -97.12780   28.23536    4   98  texas   1
82  aransas -97.07624   28.20671    4   96  texas   1
83  aransas -96.94446   28.27547    4   85  texas   1
84  aransas -97.01894   28.27547    4   101 texas   1
85  aransas -96.96165   28.19525    4   89  texas   1
86  aransas -97.11062   28.21817    4   97  texas   1
87  aransas -96.87570   28.19525    4   79  texas   1
88  aransas -97.03613   28.20098    4   94  texas   1
89  aransas -97.05905   28.21817    4   95  texas   1
90  aransas -96.97311   28.21817    4   88  texas   1
91  aransas -96.92154   28.26974    4   84  texas   1
92  aransas -96.99030   28.25255    4   86  texas   1
93  aransas -96.98457   28.23536    4   87  texas   1
94  aransas -96.82414   28.21817    4   78  texas   1
95  aransas -96.80122   28.31557    4   75  texas   1
96  aransas -96.94446   28.14942    4   81  texas   1
97  aransas -96.91009   28.16660    4   80  texas   1
98  aransas -96.92727   28.24109    4   83  texas   1
99  aransas -96.94446   28.18379    4   82  texas   1

现在地图看起来像这样... enter image description here

如何保存TEX的原始顺序?或等待,也许我只需要按顺序排序即可。...

UPDATE#2

TEX <- TEX[order(TEX$order),]

解决了问题。我很好奇为什么合并会这样改变顺序

1 个答案:

答案 0 :(得分:0)

我们可以在底数R中使用merge

result <- merge(TEX,dat,by.x="subregion",by.y="County",all.x=TRUE)
result
  subregion      long      lat group order region Cases
1  anderson -95.75271 31.53560     1     1  texas    NA
2  anderson -95.76989 31.55852     1     2  texas    NA
3  anderson -95.76416 31.58143     1     3  texas    NA
4  anderson -95.72979 31.58143     1     4  texas    NA
5  anderson -95.74698 31.61008     1     5  texas    NA
6  anderson -95.72405 31.63873     1     6  texas    NA
7    Borden -95.74698 31.61008     1     5  texas     5
8    Bosque -95.72405 31.63873     1     6  texas     3

然后我们可以将NA替换为0

result$Cases[is.na(result$Cases)] <- 0
result
  subregion      long      lat group order region Cases
1  anderson -95.75271 31.53560     1     1  texas     0
2  anderson -95.76989 31.55852     1     2  texas     0
3  anderson -95.76416 31.58143     1     3  texas     0
4  anderson -95.72979 31.58143     1     4  texas     0
5  anderson -95.74698 31.61008     1     5  texas     0
6  anderson -95.72405 31.63873     1     6  texas     0
7    Borden -95.74698 31.61008     1     5  texas     5
8    Bosque -95.72405 31.63873     1     6  texas     3

数据

TEX <- structure(list(long = c(-95.75271, -95.76989, -95.76416, -95.72979, 
-95.74698, -95.72405, -95.74698, -95.72405), lat = c(31.5356, 
31.55852, 31.58143, 31.58143, 31.61008, 31.63873, 31.61008, 31.63873
), group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), order = c(1L, 2L, 
3L, 4L, 5L, 6L, 5L, 6L), region = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "texas", class = "factor"), subregion = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 3L), .Label = c("anderson", "Borden", 
"Bosque"), class = "factor")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

dat <- structure(list(County = structure(1:3, .Label = c("Borden", "Bosque", 
"Bowue"), class = "factor"), Cases = c(5L, 3L, 1L)), class = "data.frame", row.names = c("1", 
"2", "3"))