R:识别圆内的点并基于两个数据帧计算新列

时间:2016-12-04 12:37:14

标签: r dplyr

我有一个样本数据集,如下所示

   Town_From<-c("A","A","A","B","B","C")
   Town_To<-c("B","C","D","C","D","D")
   Distance<-c(10,5,18,17,20,21)
   Df<-data.frame(Town_From,Town_To,Distance)

 Town_From Town_To  Distance 
    A         B        10     
    A         C         5     
    A         D        18     
    B         C        17     
    B         D        20     
    C         D        21      

我有另一个数据框(Df2),其人口值

   Town<-c("A","B","C","D")
   Population<-c(1000,800,500,200)
   Df2<-data.frame(Town,Population)

  Town  Population
   A     1000
   B      800
   C      500
   D      200

我需要的是一个计算列“Pop_within_Distance”

  Town_From Town_To  Distance  Pop_within_Distance
    A         B        10      2300
    A         C         5      1500
    A         D        18      2500
    B         C        17      1300
    B         D        20      1500
    C         D        21      700

Town_From是我的Origin,我需要计算位于“Town_From”和“Town_To”半径范围内的城镇人口总数为“Pop_within_Distance”

例如,

在第1行中,“Pop_within_Distance”= Pop_A + Pop_B + Pop_C = 1000 + 800 + 500 = 2300(这是因为,城镇A,B&amp; C位于距离A镇半径为10的圆圈内)

在第4行,“Pop_within_Distance”= Pop_B + Pop_C = 800 + 500 = 1300 (这是因为,只有城镇B&amp; C位于B镇半径17的圆圈内)

如何在R中计算?

1 个答案:

答案 0 :(得分:0)

您可以使用dplyr执行此操作,因为我们首先转换您的数据框,以便Town_FromTown_ToTown列是字符而不是因素(或它们是具有相同水平的因素):

library(dplyr)
Df <- Df %>% left_join(Df2,by=c("Town_To"="Town")) %>% 
             group_by(Town_From) %>% 
             arrange(Distance) %>% 
             mutate(Pop_within_Distance=cumsum(Population)+Df2$Population[Df2$Town %in% Town_From]) %>%
             select(-Population) %>% arrange(Town_From,Town_To)
##Source: local data frame [6 x 4]
##Groups: Town_From [3]
##
##  Town_From Town_To Distance Pop_within_Distance
##      <chr>   <chr>    <dbl>               <dbl>
##1         A       B       10                2300
##2         A       C        5                1500
##3         A       D       18                2500
##4         B       C       17                1300
##5         B       D       20                1500
##6         C       D       21                 700

注意:

  1. 首先left_join Town_To DfTown Df2中的两个数据框,以便我们获得此中间结果:

      Town_From Town_To Distance Population
    1         A       B       10        800
    2         A       C        5        500
    3         A       D       18        200
    4         B       C       17        500
    5         B       D       20        200
    6         C       D       21        200
    
  2. Town_From分组,并使用Distancearrange对表格进行排序。这里的要点是,我们现在可以在cumsum上使用Population来计算距离小于或等于当前行的城镇的总人口。

  3. 然后使用Pop_within_Distance创建mutate列,并使用此计算添加来自Town_From的原始城镇(即Df2)人口。
  4. 最后,删除Population列并返回原始行顺序。
  5. 数据:

    Df <- structure(list(Town_From = c("A", "A", "A", "B", "B", "C"), Town_To = c("B", 
    "C", "D", "C", "D", "D"), Distance = c(10, 5, 18, 17, 20, 21)), .Names = c("Town_From", 
    "Town_To", "Distance"), row.names = c(NA, -6L), class = "data.frame")
    ##  Town_From Town_To Distance
    ##1         A       B       10
    ##2         A       C        5
    ##3         A       D       18
    ##4         B       C       17
    ##5         B       D       20
    ##6         C       D       21
    
    Df2 <- structure(list(Town = c("A", "B", "C", "D"), Population = c(1000, 
    800, 500, 200)), .Names = c("Town", "Population"), row.names = c(NA, 
    -4L), class = "data.frame")
    ##  Town Population
    ##1    A       1000
    ##2    B        800
    ##3    C        500
    ##4    D        200