在列X中查找与Y列

时间:2018-05-04 06:44:20

标签: r

df <- data.frame(a = c(rep("a", 3), rep("b", 3), rep("c", 3)),
                 b = c(NA, NA, "test", NA, "test", "test", NA, NA, "test"),
                 c = c("trial", "test", "trial", "trial", "test", "trial", "trial",
                       "trial", "trial"), stringsAsFactors = FALSE)  

假设df包含三个变量--1个组(a),1个b值和1个c值。

我想要的是为每一行找到c列中与b列中最后一个缺失值相对应的值。

我的预期输出是try列中的内容。

+---------------------+
| try   a  b    c     |
+---------------------+
| trial a NA   trial  |
| test  a NA   test   |
| test  a test trial  |
| trial b NA   trial  |
| trial b test test   |
| trial b test trial  |
| trial c NA   trial  |
| trial c NA   trial  |
| trial c test trial  |
+---------------------+  

目前,我做了一个快速但效率低下的循环,也不允许我对任何内容进行分组。

miss <- c()
try <- c()

for (i in 1:length(df$b)) {

  miss[i] <- max(which(is.na(df[1:i,]$b)))

  try[i] <- df[miss[i], 3]

}

new <- cbind(as.data.frame(try), df)

但是,我想将其转换为data.table或dplyr方法,我最终也可以在每个组,大型数据集等上执行此操作。

有什么想法吗?

1 个答案:

答案 0 :(得分:3)

以下是dplyr

的想法
library(tidyverse)

#METHOD 1:
df %>% 
 group_by(a) %>% 
 mutate(new = tail(c[is.na(b)], 1), 
        new = replace(new, is.na(b), c[is.na(b)]))

#METHOD 2:
df %>% 
 group_by(a) %>% 
 mutate(new = replace(c, !is.na(b), NA)) %>% 
 fill(new)

两者都给予,

# A tibble: 9 x 4
# Groups:   a [3]
  a     b     c     new  
  <chr> <chr> <chr> <chr>
1 a     <NA>  trial trial
2 a     <NA>  test  test 
3 a     test  trial test 
4 b     <NA>  trial trial
5 b     test  test  trial
6 b     test  trial trial
7 c     <NA>  trial trial
8 c     <NA>  trial trial
9 c     test  trial trial