我是R的新手,正在从事一个项目。
我的data.frame
acscleantib
的格式如下
head(acscleantib[-3])
# Zip Year Total_Population Median_Income City State
# ZCTA5 00601 2015 18088 10833 Adjun PR
# ZCTA5 00602 2017 40859 16353 Agua AB
我的目标是了解2015年至2017年总人口的差异。
我的输入:
popuinc <- acscleantib %>% dplyr::filter(Year %in% c(2015,2017)) %>%
spread(Year,Total_Population) %>% group_by(Zip) %>%
summarise(`Total2015` = sum(`2015`, na.rm = TRUE),
`Total2017` = sum(`2017`, na.rm = TRUE)) %>%
mutate(Difference = Total2017- Total2015)
popuinc
# Zip Total2015 Total2017 Difference
# <fct> <int> <int> <int>
#1 ZCTA5 00601 17982 17599 -383
#2 ZCTA5 00602 40260 39209 -1051
#3 ZCTA5 00603 52408 50135 -2273
我能够在这里实现我的输出。但是,如何在过滤器中添加City
以便与各个城市一起获得最终的变异?
所需的输出示例:
Zip Total2015 Total2017 Difference City
<fct> <int> <int> <int>
1 ZCTA5 00601 17982 17599 -383 Adjunitas
2 ZCTA5 00602 40260 39209 -1051 XYZ
3 ZCTA5 00603 52408 50135 -2273 etc
答案 0 :(得分:2)
如果我理解正确,则可以将group_by(Zip)
替换为group_by(Zip, City)
df %>%
filter(Year %in% c(2015,2017)) %>%
spread(Year, Total_Population) %>%
group_by(Zip, City) %>%
summarise(
Total2015 = sum(2015, na.rm = TRUE),
Total2017 = sum(2017, na.rm = TRUE)) %>%
mutate(Difference = Total2017 - Total2015)
## A tibble: 2 x 5
## Groups: Zip [2]
# Zip City Total2015 Total2017 Difference
# <fct> <fct> <dbl> <dbl> <dbl>
#1 ZCTA5 00601 Adjun 2015 2017 2
#2 ZCTA5 00602 Agua 2015 2017 2
df <- read.table(text =
"Zip Year Total_Population Median_Income City State
'ZCTA5 00601' 2015 18088 10833 Adjun PR
'ZCTA5 00602' 2017 40859 16353 Agua AB", header = T)