Question

是否有一种简单的方法可以确定一个向量是否嵌套在另一个向量中？换句话说，在下面的示例中，bar的每个值都与foo中的一个且仅一个值相关联，因此bar嵌套在foo中。

data.frame(foo=rep(seq(4), each=4), bar=rep(seq(8), each=2))

澄清一下，这是理想的结果：

foo <- rep(seq(4), each=4)
bar <- rep(seq(8), each=2)
qux <- rep(seq(8), times=2)
# using a fake operator for illustration:
bar %is_nested_in% foo  # should return TRUE
qux %is_nested_in% foo  # should return FALSE

Answer 1

假设您有两个因素f和g，并且想知道g是否嵌套在f中。

方法1：对于喜欢线性代数的人

考虑两个因素的设计矩阵：

Xf <- model.matrix(~ f + 0)
Xg <- model.matrix(~ g + 0)

如果g嵌套在f中，则Xf的列空间必须是Xg列空间的子空间。换句话说，对于Xf列的任何线性组合：y = Xf %*% bf，等式Xg %*% bg = y可以解决完全。

y <- Xf %*% rnorm(ncol(Xf))  ## some random linear combination on `Xf`'s columns
c(crossprod(round(.lm.fit(Xg, y)$residuals, 8)))  ## least squares residuals
## if this is 0, you have nesting.

方法2：对于喜欢统计数据的人

我们检查列联表：

M <- table(f, g)

如果所有列只有一个非零条目，则g嵌套在f中。换句话说：

all(colSums(M > 0L) == 1L)
## `TRUE` if you have nesting

评论：对于任何方法，您都可以轻松地将代码压缩到一行。

Answer 2

我认为这会奏效：

nested_in <- function(b, a) {
    df <- data.frame(a, b)
    all(sapply(split(df, df$b), function(i) length(unique(i$a)) < 2))
}

foo <- rep(seq(4), each=4)
bar <- rep(seq(8), each=2)
qux <- rep(seq(8), times=2)    

nested_in(bar, foo)  # TRUE
nested_in(qux, foo)  # FALSE

测试一个因子是否嵌套在另一个因子中

2 个答案: