我发现当我对数据集进行子集化时,我必须始终如一地使用相同的模式,并且我希望简化这一点:
subset(test, (X1 == 2 | is.na(X1)) & (X2 > 4 | is.na(X2)) )
基本上,我需要在列中考虑NA,因为也匹配子集条件中的条件。我正在寻找能够产生与上述相同的功能,但理想情况下,只需要采用质量条件并生成NA条件(或者可能是模式?)
na_subset(data, X1 == 2 & X2 > 4)
一些示例数据:
test = structure(list(X1 = c(3L, NA, 7L, NA, 2L, 6L, 4L, 9L, 4L, 5L),
X2 = c(0L, 4L, 5L, 5L, NA, 5L, 8L, 7L, 2L, NA)), .Names = c("X1",
"X2"), row.names = c(NA, -10L), class = "data.frame")
示例查询:
> subset(test, (X1 == 2 | is.na(X1)) & (X2 > 4 | is.na(X2)) )
X1 X2
4 NA 5
5 2 NA
答案 0 :(得分:2)
您可能希望对此进行更多测试,但至少对于您展示此功能的测试而言。此处subsetNA
与subset.data.frame
相同,但标有##:
subsetNA <-
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r | is.na(r) ##
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
测试出来:
> subset(test, (X1 == 2 | is.na(X1)) & (X2 > 4 | is.na(X2)) )
X1 X2
4 NA 5
5 2 NA
> subsetNA(test, X1 == 2 & X2 > 4)
X1 X2
4 NA 5
5 2 NA
答案 1 :(得分:0)
不完美,但是:
data <- data.frame(V1=1:10, V2=c(1:5, NA, 1:4))
subset(data, V1 == 1 & V2 == 1 | is.na(V1 + V2))
产地:
V1 V2
1 1 1
6 6 NA