df <- data.frame(a = c(rep("a", 3), rep("b", 3), rep("c", 3)),
b = c(NA, NA, "test", NA, "test", "test", NA, NA, "test"),
c = c("trial", "test", "trial", "trial", "test", "trial", "trial",
"trial", "trial"), stringsAsFactors = FALSE)
假设df包含三个变量--1个组(a),1个b值和1个c值。
我想要的是为每一行找到c列中与b列中最后一个缺失值相对应的值。
我的预期输出是try列中的内容。
+---------------------+
| try a b c |
+---------------------+
| trial a NA trial |
| test a NA test |
| test a test trial |
| trial b NA trial |
| trial b test test |
| trial b test trial |
| trial c NA trial |
| trial c NA trial |
| trial c test trial |
+---------------------+
目前,我做了一个快速但效率低下的循环,也不允许我对任何内容进行分组。
miss <- c()
try <- c()
for (i in 1:length(df$b)) {
miss[i] <- max(which(is.na(df[1:i,]$b)))
try[i] <- df[miss[i], 3]
}
new <- cbind(as.data.frame(try), df)
但是,我想将其转换为data.table或dplyr方法,我最终也可以在每个组,大型数据集等上执行此操作。
有什么想法吗?
答案 0 :(得分:3)
以下是dplyr
,
library(tidyverse)
#METHOD 1:
df %>%
group_by(a) %>%
mutate(new = tail(c[is.na(b)], 1),
new = replace(new, is.na(b), c[is.na(b)]))
#METHOD 2:
df %>%
group_by(a) %>%
mutate(new = replace(c, !is.na(b), NA)) %>%
fill(new)
两者都给予,
# A tibble: 9 x 4 # Groups: a [3] a b c new <chr> <chr> <chr> <chr> 1 a <NA> trial trial 2 a <NA> test test 3 a test trial test 4 b <NA> trial trial 5 b test test trial 6 b test trial trial 7 c <NA> trial trial 8 c <NA> trial trial 9 c test trial trial