Question

我是R的新手，正在从事一个项目。

我的data.frame acscleantib的格式如下

head(acscleantib[-3])

#       Zip        Year Total_Population Median_Income City  State                                    
#    ZCTA5 00601    2015      18088         10833    Adjun   PR    
#      ZCTA5 00602  2017      40859         16353    Agua    AB

我的目标是了解2015年至2017年总人口的差异。

我的输入：

popuinc <-  acscleantib %>% dplyr::filter(Year %in% c(2015,2017)) %>% 
    spread(Year,Total_Population) %>% group_by(Zip) %>%
    summarise(`Total2015` = sum(`2015`, na.rm = TRUE),
            `Total2017` = sum(`2017`, na.rm = TRUE)) %>% 
    mutate(Difference = Total2017- Total2015)

popuinc

#    Zip       Total2015 Total2017 Difference
#  <fct>           <int>     <int>      <int>
#1 ZCTA5 00601     17982     17599       -383
#2 ZCTA5 00602     40260     39209      -1051
#3 ZCTA5 00603     52408     50135      -2273

我能够在这里实现我的输出。但是，如何在过滤器中添加City以便与各个城市一起获得最终的变异？

所需的输出示例：

 Zip          Total2015 Total2017  Difference City
   <fct>           <int>     <int>      <int>
 1 ZCTA5 00601     17982     17599       -383    Adjunitas
 2 ZCTA5 00602     40260     39209      -1051    XYZ
 3 ZCTA5 00603     52408     50135      -2273    etc

Answer 1

如果我理解正确，则可以将group_by(Zip)替换为group_by(Zip, City)

df %>%
    filter(Year %in% c(2015,2017)) %>%
    spread(Year, Total_Population) %>%
    group_by(Zip, City) %>%
    summarise(
        Total2015 = sum(2015, na.rm = TRUE),
        Total2017 = sum(2017, na.rm = TRUE)) %>%
    mutate(Difference = Total2017 - Total2015)
## A tibble: 2 x 5
## Groups:   Zip [2]
#  Zip         City  Total2015 Total2017 Difference
#  <fct>       <fct>     <dbl>     <dbl>      <dbl>
#1 ZCTA5 00601 Adjun      2015      2017          2
#2 ZCTA5 00602 Agua       2015      2017          2

样本数据

df <- read.table(text =
    "Zip        Year Total_Population Median_Income City  State
'ZCTA5 00601'    2015      18088         10833    Adjun   PR
  'ZCTA5 00602'  2017      40859         16353    Agua    AB", header = T)

为什么dplyr :: filter不允许我使用两个垂直方向进行过滤？

1 个答案:

样本数据