我正在尝试替换向量中的缺失值(NA
)。两个相等数字之间的NA
被该数字替换。两个不同值之间的NA
应保持NA
。例如,给定向量" a",我希望它是" b"。
a = c(1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, 2, 3, NA, NA, 3)
b = c(1, 1, 1, 1, 1, NA, NA, NA, 2, 2, 2, 2, 3, 3, 3, 3)
如您所见,价值NA
和1
之间的第二轮2
未被替换。
有没有办法对计算进行矢量化?
答案 0 :(得分:3)
# Define a vector with Leading/Lagging NAs
a <- c(NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, 2, 3, NA, NA, 3, NA, NA)
# Save the boolean vector as we are going to reuse it a lot
na_vals <- is.na(a)
# Find the NAs location compared to the non-NAs
ind <- findInterval(which(na_vals), which(!na_vals))
# Find the consecutive values that equal
ind2 <- which(!diff(a[!na_vals]))
# Fill only NAs between equal consequtive files
a[na_vals] <- a[!na_vals][ind2[match(ind, ind2)]]
a
# [1] NA NA 1 1 1 1 1 NA NA NA 2 2 2 2 3 3 3 3 NA NA
对大型载体进行一些时间比较
# Create a big vector
set.seed(123)
a <- sample(c(NA, 1:5), 5e7, replace = TRUE)
############################################
##### Cainã Max Couto-Silva
fill_data <- function(vec) {
for(l in unique(vec[!is.na(vec)])) {
g <- which(vec %in% l)
indexes <- list()
for(i in 1:(length(g) - 1)) {
indexes[[i]] <- (g[i]+1):(g[i+1]-1)
}
for(i in 1:(length(g) - 1)) {
if(all(is.na(vec[indexes[[i]]]))) {
vec[indexes[[i]]] <- l
}
}
}
return(vec)
}
system.time(res <- fill_data(a))
# user system elapsed
# 81.73 4.41 86.48
############################################
##### Henrik
system.time({
a_ap <- na.approx(a, na.rm = FALSE)
a_locf <- na.locf(a, na.rm = FALSE)
a[which(a_ap == a_locf)] <- a_ap[which(a_ap == a_locf)]
})
# user system elapsed
# 12.55 3.39 15.98
# Validate
identical(res, as.integer(a))
# [1] TRUE
############################################
##### David
## Recreate a as it been overridden
set.seed(123)
a <- sample(c(NA, 1:5), 5e7, replace = TRUE)
system.time({
# Save the boolean vector as we are going to reuse it a lot
na_vals <- is.na(a)
# Find the NAs location compaed to the non-NAs
ind <- findInterval(which(na_vals), which(!na_vals))
# Find the consecutive values that equl
ind2 <- which(!diff(a[!na_vals]))
# Fill only NAs between equal consequtive files
a[na_vals] <- a[!na_vals][ind2[match(ind, ind2)]]
})
# user system elapsed
# 3.39 0.71 4.13
# Validate
identical(res, a)
# [1] TRUE
答案 1 :(得分:2)
您可以制作类似的功能:
{
"message": "Must authenticate to access this API.",
"documentation_url": "https://developer.github.com/enterprise/2.11/v3"
}
跑步功能:
fill_data <- function(vec) {
for(l in unique(vec[!is.na(vec)])) {
g <- which(vec %in% l)
indexes <- list()
for(i in 1:(length(g) - 1)) {
indexes[[i]] <- (g[i]+1):(g[i+1]-1)
}
for(i in 1:(length(g) - 1)) {
if(all(is.na(vec[indexes[[i]]]))) {
vec[indexes[[i]]] <- l
}
}
}
return(vec)
}
如果你有一个值在不同位置的向量,它也可以工作:
a = c(1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, 2, 3, NA, NA, 3)
fill_data(a)
[1] 1 1 1 1 1 NA NA NA 2 2 2 2 3 3 3 3
说明:
首先,您会找到唯一的非NA值。
然后它获取每个唯一非NA值的索引并获取它们之间的值;
然后测试这些值是否都是NA,如果是,则用等级值替换它们。
答案 2 :(得分:2)
您可以使用zoo
包中的便捷功能。在这里,我们替换原始向量中的NA
,其中插值(由na.approx
创建)等于'最后一次观察结果'(由na.locf
创建):
library(zoo)
a_ap <- na.approx(a)
a_locf <- na.locf(a)
a[which(a_ap == a_locf)] <- a_ap[which(a_ap == a_locf)]
a
# [1] 1 1 1 1 1 NA NA NA 2 2 2 2 3 3 3 3
要考虑领先和尾随NA
,请添加na.rm = FALSE
:
a <- c(NA, 1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, 2, 3, NA, NA, 3, NA)
a_ap <- na.approx(a, na.rm = FALSE)
a_locf <- na.locf(a, na.rm = FALSE)
a[which(a_ap == a_locf)] <- a_ap[which(a_ap == a_locf)]
a
# [1] NA 1 1 1 1 1 NA NA NA 2 2 2 2 3 3 3 3 NA