我有一个样本数据集,如下所示
Town_From<-c("A","A","A","B","B","C")
Town_To<-c("B","C","D","C","D","D")
Distance<-c(10,5,18,17,20,21)
Df<-data.frame(Town_From,Town_To,Distance)
Town_From Town_To Distance
A B 10
A C 5
A D 18
B C 17
B D 20
C D 21
我有另一个数据框(Df2),其人口值
Town<-c("A","B","C","D")
Population<-c(1000,800,500,200)
Df2<-data.frame(Town,Population)
Town Population
A 1000
B 800
C 500
D 200
我需要的是一个计算列“Pop_within_Distance”
Town_From Town_To Distance Pop_within_Distance
A B 10 2300
A C 5 1500
A D 18 2500
B C 17 1300
B D 20 1500
C D 21 700
Town_From是我的Origin,我需要计算位于“Town_From”和“Town_To”半径范围内的城镇人口总数为“Pop_within_Distance”
例如,
在第1行中,“Pop_within_Distance”= Pop_A + Pop_B + Pop_C = 1000 + 800 + 500 = 2300(这是因为,城镇A,B&amp; C位于距离A镇半径为10的圆圈内)
在第4行,“Pop_within_Distance”= Pop_B + Pop_C = 800 + 500 = 1300 (这是因为,只有城镇B&amp; C位于B镇半径17的圆圈内)
如何在R中计算?
答案 0 :(得分:0)
您可以使用dplyr
执行此操作,因为我们首先转换您的数据框,以便Town_From
,Town_To
和Town
列是字符而不是因素(或它们是具有相同水平的因素):
library(dplyr)
Df <- Df %>% left_join(Df2,by=c("Town_To"="Town")) %>%
group_by(Town_From) %>%
arrange(Distance) %>%
mutate(Pop_within_Distance=cumsum(Population)+Df2$Population[Df2$Town %in% Town_From]) %>%
select(-Population) %>% arrange(Town_From,Town_To)
##Source: local data frame [6 x 4]
##Groups: Town_From [3]
##
## Town_From Town_To Distance Pop_within_Distance
## <chr> <chr> <dbl> <dbl>
##1 A B 10 2300
##2 A C 5 1500
##3 A D 18 2500
##4 B C 17 1300
##5 B D 20 1500
##6 C D 21 700
注意:
首先left_join
Town_To
Df
和Town
Df2
中的两个数据框,以便我们获得此中间结果:
Town_From Town_To Distance Population
1 A B 10 800
2 A C 5 500
3 A D 18 200
4 B C 17 500
5 B D 20 200
6 C D 21 200
按Town_From
分组,并使用Distance
按arrange
对表格进行排序。这里的要点是,我们现在可以在cumsum
上使用Population
来计算距离小于或等于当前行的城镇的总人口。
Pop_within_Distance
创建mutate
列,并使用此计算添加来自Town_From
的原始城镇(即Df2
)人口。Population
列并返回原始行顺序。数据:强>
Df <- structure(list(Town_From = c("A", "A", "A", "B", "B", "C"), Town_To = c("B",
"C", "D", "C", "D", "D"), Distance = c(10, 5, 18, 17, 20, 21)), .Names = c("Town_From",
"Town_To", "Distance"), row.names = c(NA, -6L), class = "data.frame")
## Town_From Town_To Distance
##1 A B 10
##2 A C 5
##3 A D 18
##4 B C 17
##5 B D 20
##6 C D 21
Df2 <- structure(list(Town = c("A", "B", "C", "D"), Population = c(1000,
800, 500, 200)), .Names = c("Town", "Population"), row.names = c(NA,
-4L), class = "data.frame")
## Town Population
##1 A 1000
##2 B 800
##3 C 500
##4 D 200