我想知道如何用插值按组替换特定列中的NA
值。我的某些组只有一个非NA,我想用一个非NA填充这些组
如果我有一个像这样的数据框:
Group Value
ALB NA
ALB 10
ALB NA
ALB 12
ARE NA
ARE NA
ARE 2
ARE NA
ARE NA
ARG 4
ARG NA
ARG 6
我想创建一个新列,因此我的数据框如下所示:
Group Value New Column
ALB NA 9
ALB 10 10
ALB NA 11
ALB 12 12
ARE NA 2
ARE NA 2
ARE 2 2
ARE NA 2
ARE NA 2
ARG 4 4
ARG NA 5
ARG 6 6
答案 0 :(得分:1)
df <- data.frame(
group = rep(1:2, each = 4),
value = c(NA, 10, NA, 12, 4, NA, NA, 7))
complete <- function(x){
i <- which.min(is.na(x))
y <- seq_along(x) + x[i] - i
return(y)
}
newdf <- do.call(rbind,
lapply(split(df, df$group),
function(dat){
transform(dat, newvalue=complete(value))
}))
rownames(newdf) <- NULL
这给出了:
> newdf
group value newvalue
1 1 NA 9
2 1 10 10
3 1 NA 11
4 1 12 12
5 2 4 4
6 2 NA 5
7 2 NA 6
8 2 7 7
答案 1 :(得分:1)
此单线将按组对NA进行插值,并且对于组末尾的NA会将最接近的非NA扩展到该值,使其具有相同的值,即它进行线性插值和常数外推,这并不完全正确要求什么,但可能足够接近。请注意,这还意味着如果只有一个非NA,则所有NA都将设置为非NA。
library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))
给予:
Group Value newCol
1 ALB NA 10
2 ALB 10 10
3 ALB NA 11
4 ALB 12 12
5 ARE NA 2
6 ARE NA 2
7 ARE 2 2
8 ARE NA 2
9 ARE NA 2
10 ARG 4 4
11 ARG NA 5
12 ARG 6 6
DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA,
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA,
-8L))
DF <-
structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"),
Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
)), class = "data.frame", row.names = c(NA, -12L))
答案 2 :(得分:0)
在na.approx
包中检出zoo
:
https://www.rdocumentation.org/packages/zoo/versions/1.8-2/topics/na.approx
您可以在数据上使用split
,然后使用apply
。
> library(zoo)
> df<- data.frame(
group = rep(1:2, each = 4),
value = c(NA, 10, NA, 12, 4, NA, NA, 7))
> df
group value
1 1 NA
2 1 10
3 1 NA
4 1 12
5 2 4
6 2 NA
7 2 NA
8 2 7
> dfl <- split(df$value, df$group)
> dfl
$`1`
[1] NA 10 NA 12
$`2`
[1] 4 NA NA 7
> lapply(dfl, na.approx)
$`1`
[1] 10 11 12
$`2`
[1] 4 5 6 7
但是,如果每个组中都有上限和下限,这将起作用。否则,您将遇到类似第一组的问题,无法确定用NA代替什么。
答案 3 :(得分:0)
可以考虑使用Hmisc::approxExtrap
函数来执行interpolate
和extrapolate
缺失值。您必须同时提供x
和y
值,而没有NA
,它们将用作reference
至interpolate/extrapolate
缺失值。函数在xout
之后返回所需的集合/行(与参数interpolating/extrapolating
一起传递)。
library(Hmisc)
library(dplyr)
df %>% group_by(Group) %>%
mutate(newVal =
approxExtrap(which(!is.na(Value)), Value[!is.na(Value)],xout = 1:n(), rule=2)$y) %>%
as.data.frame()
# Group Value newVal
# 1 1 NA 9
# 2 1 10 10
# 3 1 NA 11
# 4 1 12 12
# 5 2 4 4
# 6 2 NA 5
# 7 2 NA 6
# 8 2 7 7
数据:
df <- read.table(text =
"Group Value
1 NA
1 10
1 NA
1 12
2 4
2 NA
2 NA
2 7",
header = TRUE)