填写组中的缺失值

时间:2016-05-01 02:11:11

标签: r missing-data

我有数据框,其中缺少一些值

A 1
A NA
A NA
B NA
B 2
B NA
C NA
C NA
C NA

如何填写我有数据的小组?

3 个答案:

答案 0 :(得分:5)

我们可以使用data.table。转换' data.frame'到' data.table' (setDT(df1)),按ID'分组,我们分配(:=)列' v1'作为第一个非NA值。

library(data.table)
setDT(df1)[, v1:= v1[!is.na(v1)][1L] , by = ID]
df1
#   ID v1
#1:  A  1
#2:  A  1
#3:  A  1
#4:  B  2
#5:  B  2
#6:  B  2
#7:  C NA
#8:  C NA
#9:  C NA

或仅使用base R

 with(df1, ave(v1, ID, FUN = function(x)
          replace(x, is.na(x), x[!is.na(x)][1L])))
 #[1]  1  1  1  2  2  2 NA NA NA

数据

df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "B", "C", "C", 
"C"), v1 = c(1L, NA, NA, NA, 2L, NA, NA, NA, NA)), .Names = c("ID", 
"v1"), class = "data.frame", row.names = c(NA, -9L))

答案 1 :(得分:5)

替代解决方案,虽然它可能有多少假设有点缺陷:

library(dplyr)
y %>%
  group_by(V1) %>%
  arrange(V2) %>%
  mutate(V2 = V2[1])
# Source: local data frame [9 x 2]
# Groups: V1 [3]
#      V1    V2
#   (chr) (int)
# 1     A     1
# 2     A     1
# 3     A     1
# 4     B     2
# 5     B     2
# 6     B     2
# 7     C    NA
# 8     C    NA
# 9     C    NA

答案 2 :(得分:4)

您还可以使用fill中的tidyr

library(dplyr)
library(tidyr)

df1 %>%
  group_by(ID) %>%
  fill(v1) %>%
  fill(v1, .direction = "up")

<强>结果:

# A tibble: 9 x 2
# Groups:   ID [3]
     ID    v1
  <chr> <int>
1     A     1
2     A     1
3     A     1
4     B     2
5     B     2
6     B     2
7     C    NA
8     C    NA
9     C    NA

@ dput

致@akrun