通过组内插替换NA

时间:2018-07-14 18:49:26

标签: r

我想知道如何用插值按组替换特定列中的NA值。我的某些组只有一个非NA,我想用一个非NA填充这些组

如果我有一个像这样的数据框:

Group Value
ALB     NA
ALB     10
ALB     NA
ALB     12
ARE     NA
ARE     NA
ARE     2
ARE     NA
ARE     NA
ARG     4
ARG     NA
ARG     6

我想创建一个新列,因此我的数据框如下所示:

Group Value New Column
ALB     NA    9
ALB     10    10
ALB     NA    11
ALB     12    12
ARE     NA    2
ARE     NA    2
ARE     2     2
ARE     NA    2
ARE     NA    2
ARG     4     4
ARG     NA    5
ARG     6     6

4 个答案:

答案 0 :(得分:1)

df <- data.frame(
  group = rep(1:2, each = 4), 
  value = c(NA, 10, NA, 12, 4, NA, NA, 7))

complete <- function(x){
  i <- which.min(is.na(x))
  y <- seq_along(x) + x[i] - i
  return(y)
}

newdf <- do.call(rbind, 
                 lapply(split(df, df$group), 
                        function(dat){
                          transform(dat, newvalue=complete(value))
                        }))
rownames(newdf) <- NULL

这给出了:

> newdf
  group value newvalue
1     1    NA        9
2     1    10       10
3     1    NA       11
4     1    12       12
5     2     4        4
6     2    NA        5
7     2    NA        6
8     2     7        7

答案 1 :(得分:1)

此单线将按组对NA进行插值,并且对于组末尾的NA会将最接近的非NA扩展到该值,使其具有相同的值,即它进行线性插值和常数外推,这并不完全正确要求什么,但可能足够接近。请注意,这还意味着如果只有一个非NA,则所有NA都将设置为非NA。

library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))

给予:

   Group Value newCol
1    ALB    NA     10
2    ALB    10     10
3    ALB    NA     11
4    ALB    12     12
5    ARE    NA      2
6    ARE    NA      2
7    ARE     2      2
8    ARE    NA      2
9    ARE    NA      2
10   ARG     4      4
11   ARG    NA      5
12   ARG     6      6

注意

DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA, 
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA, 
-8L))

DF <- 
  structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
  2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"), 
  Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
  )), class = "data.frame", row.names = c(NA, -12L))

答案 2 :(得分:0)

na.approx包中检出zoo

https://www.rdocumentation.org/packages/zoo/versions/1.8-2/topics/na.approx

您可以在数据上使用split,然后使用apply

 > library(zoo)
 > df<- data.frame(
      group = rep(1:2, each = 4), 
      value = c(NA, 10, NA, 12, 4, NA, NA, 7))
 > df
     group value
 1     1    NA
 2     1    10
 3     1    NA
 4     1    12
 5     2     4
 6     2    NA
 7     2    NA
 8     2     7
 > dfl <- split(df$value, df$group)
 > dfl
   $`1`
   [1] NA 10 NA 12

   $`2`
   [1]  4 NA NA  7

 > lapply(dfl, na.approx)
   $`1`
   [1] 10 11 12

   $`2`
   [1] 4 5 6 7

但是,如果每个组中都有上限和下限,这将起作用。否则,您将遇到类似第一组的问题,无法确定用NA代替什么。

答案 3 :(得分:0)

可以考虑使用Hmisc::approxExtrap函数来执行interpolateextrapolate缺失值。您必须同时提供xy值,而没有NA,它们将用作referenceinterpolate/extrapolate缺失值。函数在xout之后返回所需的集合/行(与参数interpolating/extrapolating一起传递)。

library(Hmisc)
library(dplyr)

df %>% group_by(Group) %>%
mutate(newVal = 
approxExtrap(which(!is.na(Value)), Value[!is.na(Value)],xout = 1:n(), rule=2)$y) %>%
  as.data.frame()

#   Group Value newVal
# 1     1    NA      9
# 2     1    10     10
# 3     1    NA     11
# 4     1    12     12
# 5     2     4      4
# 6     2    NA      5
# 7     2    NA      6
# 8     2     7      7

数据:

df <- read.table(text =
"Group Value
1     NA
1     10
1     NA
1     12
2     4
2     NA
2     NA
2     7",
header = TRUE)