为R中的每个组填充相同的值

时间:2016-08-10 20:23:57

标签: r

我有数据框

ID <- c(1,1,2,2,2,3,3)
x <- c("1st","","1st","1st","","","")
y <- c("2nd","2nd","","","","2nd","2nd")
z <- c("","","3rd","3rd","","","3rd")
df <- data.frame(ID,x,y,z)
df
  ID   x   y   z
1  1 1st 2nd    
2  1     2nd    
3  2 1st     3rd
4  2 1st     3rd
5  2            
6  3     2nd    
7  3     2nd 3rd

我希望按ID,输出

填充相同的值
  ID   x   y   z  x1  y1  z1
1  1 1st 2nd     1st 2nd    
2  1     2nd     1st 2nd    
3  2 1st     3rd 1st     3rd
4  2 1st     3rd 1st     3rd
5  2             1st     3rd
6  3     2nd         2nd 3rd
7  3     2nd 3rd     2nd 3rd

如果ID 1具有1st,则新变量x1将具有ID1的全部“1st”,依此类推 如果我有更多变量,则更新数据,但我只需要使用x,y,z

ID <- c(1,1,2,2,2,3,3)
x <- c("1st","","1st","1st","","","")
y <- c("2nd","2nd","","","","2nd","2nd")
z <- c("","","3rd","3rd","","","3rd")
m <- c(10:16)
n <- c(20:26)
df <- data.frame(ID,x,y,z,m,n)

3 个答案:

答案 0 :(得分:2)

这是一种利用tidyr::fill的方法。如果您使用NA而不是空字符串(一个好主意),这种方法将非常简单:

library(dplyr)
library(tidyr)

       # add versions of x to z with NA instead of empty strings
df %>% mutate_at(vars(x:z), funs('1' = na_if(., ''))) %>% 
    # set grouping for following operations
    group_by(ID) %>% 
    # for added columns, fill values downwards and upwards within each group
    fill(x_1:z_1) %>% fill(x_1:z_1, .direction = 'up') %>%
    # reinsert empty strings for NAs
    mutate_at(vars(x_1:z_1), funs(coalesce(., factor(''))))

## Source: local data frame [7 x 9]
## Groups: ID [3]
## 
##      ID      x      y      z     m     n    x_1    y_1    z_1
##   <dbl> <fctr> <fctr> <fctr> <int> <int> <fctr> <fctr> <fctr>
## 1     1    1st    2nd           10    20    1st    2nd       
## 2     1           2nd           11    21    1st    2nd       
## 3     2    1st           3rd    12    22    1st           3rd
## 4     2    1st           3rd    13    23    1st           3rd
## 5     2                         14    24    1st           3rd
## 6     3           2nd           15    25           2nd    3rd
## 7     3           2nd    3rd    16    26           2nd    3rd

答案 1 :(得分:2)

使用data.table稍微更直接的方法:

df = data.frame(ID, x, y, z, stringsAsFactors=FALSE)

require(data.table)
setDT(df)[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(x != "")]), by = ID]

答案 2 :(得分:1)

我们可以使用dplyr

library(dplyr)
df %>% 
     group_by(ID) %>%
     mutate_each(funs((.[.!=""][1]))) %>%
     setNames(., c("ID", paste0(names(df)[-1], 1))) %>%
     select(-ID) %>% 
     bind_cols(df, .)
#ID   x   y   z ID   x1   y1   z1
#1  1 1st 2nd      1  1st  2nd <NA>
#2  1     2nd      1  1st  2nd <NA>
#3  2 1st     3rd  2  1st <NA>  3rd
#4  2 1st     3rd  2  1st <NA>  3rd
#5  2              2  1st <NA>  3rd
#6  3     2nd      3 <NA>  2nd  3rd
#7  3     2nd 3rd  3 <NA>  2nd  3rd