R - 复制组内的值

时间:2018-03-13 12:20:01

标签: r copy replicate

我有一个数据框,其中我有过去3年(2016年,2017年,2018年)得分的总分数,但也有每年得分数的专栏。

我的数据框如下所示:

myDF <- data.frame(ID =c(1,1,1,2,2,3,4),
 Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
 Total_Points = c(5, 5, 5, 4, 4, 2, 3),
 Points2016 = c(3, NA, NA, 2, NA, NA, 3),
 Points2017 = c(NA,1,NA,NA,2,NA,NA),
 Points2018= c(NA,NA,1, NA, NA, 2, NA))

问题在于我想为每个组复制“Points2016”,“Points2017”和“Points2017”列的值,以使其条目看起来相同。

我不确定解释是否清楚所以这将是我预期的输出:

myDF_final <- data.frame(ID =c(1,1,1,2,2,3,4),
               Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
               Total_Points = c(5, 5, 5, 4, 4, 2, 3),
               Points2016 = c(3, 3, 3, 2, 2, NA, 3),
               Points2017 = c(1,1,1,2,2,NA,NA),
               Points2018= c(1,1,1, NA, NA, 2, NA))

基本上,我希望每个ID的“Points201X”列都有相同的值。

2 个答案:

答案 0 :(得分:7)

我认为您可以通过ID组在两个方向上填写。使用dplyrtidyr,我们可以:

library(dplyr)
library(tidyr)

myDF %>% 
  group_by(ID) %>% 
  fill(Points2016, Points2017, Points2018) %>% 
  fill(Points2016, Points2017, Points2018, .direction = "up")

返回:

  ID Dates Total_Points Points2016 Points2017 Points2018
1  1  2016            5          3          1          1
2  1  2017            5          3          1          1
3  1  2018            5          3          1          1
4  2  2016            4          2          2         NA
5  2  2017            4          2          2         NA
6  3  2018            2         NA         NA          2
7  4  2016            3          3         NA         NA

此外,如果你有一堆年份说1970年至2018年,你可以做类似的事情:

myDF %>% 
  gather(points_year, points, -c(ID, Dates, Total_Points)) %>% 
  group_by(ID, points_year) %>% 
  fill(points) %>% 
  fill(points, .direction = "up") %>% 
  spread(points_year, points)

以免每年打字。但是,这涉及收集和传播可能不必要的数据,假设我们需要的变量fill遵循一致的命名约定。在这种情况下,存在一致的命名约定,我们可以使用dplyr myDF %>% group_by(ID) %>% fill(starts_with("Points"), .direction = "down") %>% fill(starts_with("Points"), .direction = "up") 后端来填充以“Points”开头的所有变量:

data.table

或者,这似乎适用于zoolibrary(data.table) library(zoo) dt <- as.data.table(myDF) dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x)), by = ID, .SDcols = 4:6] dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x, fromLast = TRUE)), by = ID, .SDcols = 4:6]

dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf(x)), by = ID, .SDcols = 4:6]

这一个班轮似乎也一气呵成:

   ID Dates Total_Points Points2016 Points2017 Points2018
1:  1  2016            5          3          1          1
2:  1  2017            5          3          1          1
3:  1  2018            5          3          1          1
4:  2  2016            4          2          2         NA
5:  2  2017            4          2          2         NA
6:  3  2018            2         NA         NA          2
7:  4  2016            3          3         NA         NA
if

答案 1 :(得分:2)

您还可以使用zoo::na.locf0填充NA的顶部和底部。

library(tidyverse);
library(zoo);
myDF %>%
    group_by(ID) %>%
    mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = F))) %>%
    mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = T)))
## A tibble: 7 x 6
## Groups:   ID [4]
#     ID Dates Total_Points Points2016 Points2017 Points2018
#  <dbl> <fct>        <dbl>      <dbl>      <dbl>      <dbl>
#1    1. 2016            5.         3.         1.         1.
#2    1. 2017            5.         3.         1.         1.
#3    1. 2018            5.         3.         1.         1.
#4    2. 2016            4.         2.         2.        NA
#5    2. 2017            4.         2.         2.        NA
#6    3. 2018            2.        NA         NA          2.
#7    4. 2016            3.         3.        NA         NA