我有一个数据框,其中我有过去3年(2016年,2017年,2018年)得分的总分数,但也有每年得分数的专栏。
我的数据框如下所示:
myDF <- data.frame(ID =c(1,1,1,2,2,3,4),
Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
Total_Points = c(5, 5, 5, 4, 4, 2, 3),
Points2016 = c(3, NA, NA, 2, NA, NA, 3),
Points2017 = c(NA,1,NA,NA,2,NA,NA),
Points2018= c(NA,NA,1, NA, NA, 2, NA))
问题在于我想为每个组复制“Points2016”,“Points2017”和“Points2017”列的值,以使其条目看起来相同。
我不确定解释是否清楚所以这将是我预期的输出:
myDF_final <- data.frame(ID =c(1,1,1,2,2,3,4),
Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
Total_Points = c(5, 5, 5, 4, 4, 2, 3),
Points2016 = c(3, 3, 3, 2, 2, NA, 3),
Points2017 = c(1,1,1,2,2,NA,NA),
Points2018= c(1,1,1, NA, NA, 2, NA))
基本上,我希望每个ID的“Points201X”列都有相同的值。
答案 0 :(得分:7)
我认为您可以通过ID
组在两个方向上填写。使用dplyr
和tidyr
,我们可以:
library(dplyr)
library(tidyr)
myDF %>%
group_by(ID) %>%
fill(Points2016, Points2017, Points2018) %>%
fill(Points2016, Points2017, Points2018, .direction = "up")
返回:
ID Dates Total_Points Points2016 Points2017 Points2018 1 1 2016 5 3 1 1 2 1 2017 5 3 1 1 3 1 2018 5 3 1 1 4 2 2016 4 2 2 NA 5 2 2017 4 2 2 NA 6 3 2018 2 NA NA 2 7 4 2016 3 3 NA NA
此外,如果你有一堆年份说1970年至2018年,你可以做类似的事情:
myDF %>%
gather(points_year, points, -c(ID, Dates, Total_Points)) %>%
group_by(ID, points_year) %>%
fill(points) %>%
fill(points, .direction = "up") %>%
spread(points_year, points)
以免每年打字。但是,这涉及收集和传播可能不必要的数据,假设我们需要的变量fill
遵循一致的命名约定。在这种情况下,存在一致的命名约定,我们可以使用dplyr
myDF %>%
group_by(ID) %>%
fill(starts_with("Points"), .direction = "down") %>%
fill(starts_with("Points"), .direction = "up")
后端来填充以“Points”开头的所有变量:
data.table
或者,这似乎适用于zoo
和library(data.table)
library(zoo)
dt <- as.data.table(myDF)
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x)), by = ID, .SDcols = 4:6]
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x, fromLast = TRUE)), by = ID, .SDcols = 4:6]
:
dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf(x)), by = ID, .SDcols = 4:6]
这一个班轮似乎也一气呵成:
ID Dates Total_Points Points2016 Points2017 Points2018
1: 1 2016 5 3 1 1
2: 1 2017 5 3 1 1
3: 1 2018 5 3 1 1
4: 2 2016 4 2 2 NA
5: 2 2017 4 2 2 NA
6: 3 2018 2 NA NA 2
7: 4 2016 3 3 NA NA
if
答案 1 :(得分:2)
您还可以使用zoo::na.locf0
填充NA
的顶部和底部。
library(tidyverse);
library(zoo);
myDF %>%
group_by(ID) %>%
mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = F))) %>%
mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = T)))
## A tibble: 7 x 6
## Groups: ID [4]
# ID Dates Total_Points Points2016 Points2017 Points2018
# <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#1 1. 2016 5. 3. 1. 1.
#2 1. 2017 5. 3. 1. 1.
#3 1. 2018 5. 3. 1. 1.
#4 2. 2016 4. 2. 2. NA
#5 2. 2017 4. 2. 2. NA
#6 3. 2018 2. NA NA 2.
#7 4. 2016 3. 3. NA NA