这很难解释,但是基本上我有一个非常简单的带有县和病例的数据框
dat <- "County Cases
1 Borden 5
2 Bosque 3
3 Bowue 1"
并且我有一个来自TEX <- map_data('county', 'texas')
的大型数据框。
> head(TEX)
long lat group order region subregion
1 -95.75271 31.53560 1 1 texas anderson
2 -95.76989 31.55852 1 2 texas anderson
3 -95.76416 31.58143 1 3 texas anderson
4 -95.72979 31.58143 1 4 texas anderson
5 -95.74698 31.61008 1 5 texas anderson
6 -95.72405 31.63873 1 6 texas anderson
我想做的是检查每一行,如果子区域位于数据框 dat 中,然后将相应数量的案例添加到 TEX 中名为“案例”;否则请添加0。
例如
> head(TEX)
long lat group order region subregion cases
1 -95.75271 31.53560 1 1 texas anderson 0
2 -95.76989 31.55852 1 2 texas anderson 0
3 -95.76416 31.58143 1 3 texas anderson 0
4 -95.72979 31.58143 1 4 texas anderson 0
5 -95.74698 31.61008 1 5 texas Borden 5
6 -95.72405 31.63873 1 6 texas Bosque 3
我尝试用这段代码来完成
for (val in counties$counties) {
for (vall in TEX$subregion) {
if (val == vall) TEX$cases = counties$cases
}
}
但是我得到这个错误
Error in `$<-.data.frame`(`*tmp*`, "cases", value = c(5L, 3L, 2L, 1L, :
replacement has 10 rows, data has 4488
我在这里的最终目标是,根据我不断增长的县和案例列表,创建一个拥有COVID案例的德克萨斯州的州长。如果您有比我更好的方法,那么我会接受的!
致谢!
更新:Ian的解决方案效果很好,但是却导致ggplot和映射出现问题。如果我在合并之前先取下数据框TEX的一部分,则它看起来像这样
6 -96.81268 28.28693 4 76 texas aransas
77 -96.80695 28.25828 4 77 texas aransas
78 -96.82414 28.21817 4 78 texas aransas
79 -96.87570 28.19525 4 79 texas aransas
80 -96.91009 28.16660 4 80 texas aransas
81 -96.94446 28.14942 4 81 texas aransas
82 -96.94446 28.18379 4 82 texas aransas
83 -96.92727 28.24109 4 83 texas aransas
84 -96.92154 28.26974 4 84 texas aransas
85 -96.94446 28.27547 4 85 texas aransas
86 -96.99030 28.25255 4 86 texas aransas
87 -96.98457 28.23536 4 87 texas aransas
88 -96.97311 28.21817 4 88 texas aransas
89 -96.96165 28.19525 4 89 texas aransas
90 -96.97311 28.17233 4 90 texas aransas
91 -97.00175 28.15515 4 91 texas aransas
92 -97.03613 28.15515 4 92 texas aransas
93 -97.04186 28.17233 4 93 texas aransas
94 -97.03613 28.20098 4 94 texas aransas
95 -97.05905 28.21817 4 95 texas aransas
96 -97.07624 28.20671 4 96 texas aransas
97 -97.11062 28.21817 4 97 texas aransas
98 -97.12780 28.23536 4 98 texas aransas
99 -97.12780 28.25255 4 99 texas aransas
100 -97.11062 28.26401 4 100 texas aransas
101 -97.01894 28.27547 4 101 texas aransas
102 -96.80122 28.31557 4 102 texas aransas
以及绘制后
ggplot(TEX, aes(long,lat, group = group)) + geom_polygon(aes(fill = subregion),color = "black") + theme(legend.position = "none") + coord_quickmap()
太棒了!现在,当我执行合并功能时,TEX将重新排列
72 aransas -97.00175 28.15515 4 91 texas 1
73 aransas -97.04186 28.17233 4 93 texas 1
74 aransas -96.80695 28.25828 4 77 texas 1
75 aransas -96.80122 28.31557 4 102 texas 1
76 aransas -97.03613 28.15515 4 92 texas 1
77 aransas -96.81268 28.28693 4 76 texas 1
78 aransas -97.12780 28.25255 4 99 texas 1
79 aransas -97.11062 28.26401 4 100 texas 1
80 aransas -96.97311 28.17233 4 90 texas 1
81 aransas -97.12780 28.23536 4 98 texas 1
82 aransas -97.07624 28.20671 4 96 texas 1
83 aransas -96.94446 28.27547 4 85 texas 1
84 aransas -97.01894 28.27547 4 101 texas 1
85 aransas -96.96165 28.19525 4 89 texas 1
86 aransas -97.11062 28.21817 4 97 texas 1
87 aransas -96.87570 28.19525 4 79 texas 1
88 aransas -97.03613 28.20098 4 94 texas 1
89 aransas -97.05905 28.21817 4 95 texas 1
90 aransas -96.97311 28.21817 4 88 texas 1
91 aransas -96.92154 28.26974 4 84 texas 1
92 aransas -96.99030 28.25255 4 86 texas 1
93 aransas -96.98457 28.23536 4 87 texas 1
94 aransas -96.82414 28.21817 4 78 texas 1
95 aransas -96.80122 28.31557 4 75 texas 1
96 aransas -96.94446 28.14942 4 81 texas 1
97 aransas -96.91009 28.16660 4 80 texas 1
98 aransas -96.92727 28.24109 4 83 texas 1
99 aransas -96.94446 28.18379 4 82 texas 1
如何保存TEX的原始顺序?或等待,也许我只需要按顺序排序即可。...
UPDATE#2
TEX <- TEX[order(TEX$order),]
解决了问题。我很好奇为什么合并会这样改变顺序
答案 0 :(得分:0)
我们可以在底数R中使用merge
。
result <- merge(TEX,dat,by.x="subregion",by.y="County",all.x=TRUE)
result
subregion long lat group order region Cases
1 anderson -95.75271 31.53560 1 1 texas NA
2 anderson -95.76989 31.55852 1 2 texas NA
3 anderson -95.76416 31.58143 1 3 texas NA
4 anderson -95.72979 31.58143 1 4 texas NA
5 anderson -95.74698 31.61008 1 5 texas NA
6 anderson -95.72405 31.63873 1 6 texas NA
7 Borden -95.74698 31.61008 1 5 texas 5
8 Bosque -95.72405 31.63873 1 6 texas 3
然后我们可以将NA
替换为0
。
result$Cases[is.na(result$Cases)] <- 0
result
subregion long lat group order region Cases
1 anderson -95.75271 31.53560 1 1 texas 0
2 anderson -95.76989 31.55852 1 2 texas 0
3 anderson -95.76416 31.58143 1 3 texas 0
4 anderson -95.72979 31.58143 1 4 texas 0
5 anderson -95.74698 31.61008 1 5 texas 0
6 anderson -95.72405 31.63873 1 6 texas 0
7 Borden -95.74698 31.61008 1 5 texas 5
8 Bosque -95.72405 31.63873 1 6 texas 3
数据
TEX <- structure(list(long = c(-95.75271, -95.76989, -95.76416, -95.72979,
-95.74698, -95.72405, -95.74698, -95.72405), lat = c(31.5356,
31.55852, 31.58143, 31.58143, 31.61008, 31.63873, 31.61008, 31.63873
), group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), order = c(1L, 2L,
3L, 4L, 5L, 6L, 5L, 6L), region = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "texas", class = "factor"), subregion = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 3L), .Label = c("anderson", "Borden",
"Bosque"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
dat <- structure(list(County = structure(1:3, .Label = c("Borden", "Bosque",
"Bowue"), class = "factor"), Cases = c(5L, 3L, 1L)), class = "data.frame", row.names = c("1",
"2", "3"))