Question

我有一个名为df的数据框，看起来像。

x   y
A   NA 
B   d1
L   d2 
F   c1 
L   s2 
A   c4 
B   NA
B   NA
A   c1
F   a5
G   NA
H   NA

我希望按x进行分组，并在可能的情况下使用该组中的第一个非NA元素填充NA值。请注意，某些群组不会包含非NA元素，因此返回NA就可以了。

df %>% group_by(x) %>% mutate(new_y = first(y))

即使该群组存在非NA值，

也会返回包含NA＆＃39的第一个值。

Answer 1

我们可以使用replace

df %>%
   group_by(x) %>%
   mutate(y = replace(y, is.na(y), y[!is.na(y)][1]))
#      x     y
#   <chr> <chr>
#1      A    c4
#2      B    d1
#3      L    d2
#4      F    c1
#5      L    s2
#6      A    c4
#7      B    d1
#8      B    d1
#9      A    c1
#10     F    a5
#11     G  <NA>
#12     H  <NA>

或者我们可以在data.table

中加入

library(data.table)
library(tidyr)
setDT(df)[df[order(x, is.na(y)), .SD[1L], x], y := coalesce(y, i.y),on = .(x)]
df
#    x  y
# 1: A c4
# 2: B d1
# 3: L d2
# 4: F c1
# 5: L s2
# 6: A c4
# 7: B d1
# 8: B d1
# 9: A c1
#10: F a5
#11: G NA
#12: H NA

或使用base R

df$y <- with(df, ave(y, x, FUN = function(x) replace(x, is.na(x), x[!is.na(x)][1])))

数据

df <- structure(list(x = c("A", "B", "L", "F", "L", "A", "B", "B", 
 "A", "F", "G", "H"), y = c(NA, "d1", "d2", "c1", "s2", "c4", 
NA, NA, "c1", "a5", NA, NA)), .Names = c("x", "y"), class = "data.frame",
row.names = c(NA, -12L))

当存在非NA值时，如何返回除NA以外的组的第一个元素

1 个答案:

数据