Question

我正在比较两个公式列表，以查看是否可以重用某些先前计算的模型。现在，我正在这样做：

set.seed(123)

# create some random formulas
l1 <- l2 <- list()
for (i in 1:10) {
  l1[[i]] <- as.formula(paste("z ~", paste(sample(letters, 3), collapse = " + ")))
  l2[[i]] <- as.formula(paste("z ~", paste(sample(letters, 3), collapse = " + ")))
}
# at least one appears in the other list
l1[[5]] <- l2[[7]]

# helper function to convert formulas to character strings
as.formulaCharacter <- function(x) paste(deparse(x))

# convert both lists to strings
s1 <- sapply(l1, as.formulaCharacter)
s2 <- sapply(l2, as.formulaCharacter)

# look up elements of one vector in the other
idx <- match(s1, s2, nomatch = 0L) # 7
s1[idx] # found matching elements

但是，我注意到某些公式虽然实际上是等效的，但仍未检索到。

f1 <- z ~ b + c + b:c
f2 <- z ~ c + b + c:b

match(as.formulaCharacter(f1), as.formulaCharacter(f2)) # no match

我明白为什么结果不同，字符串不一样，但是我在努力扩展这种方法，使其也适用于具有重新排序元素的公式。我可以使用strsplit首先对所有公式成分进行独立排序，但这对我来说听起来效率极低。

有什么想法吗？

Answer 1

如果将公式限制为包含用冒号分隔的变量的项的总和，那么我们可以通过提取项标签，用冒号对那些项进行爆炸，对其进行排序，将爆炸后的项粘贴回去，对其进行排序和翻转来创建标准化字符串变成公式字符串。

stdize <- function(fo) {
  s <- strsplit(attr(terms(f2), "term.labels"), ":") 
  terms <- sort(sapply(lapply(s, sort), paste, collapse = ":"))
  format(reformulate(terms, all.vars(fo)[1]))
}

stdize(f1) == stdize(f2)
## [1] TRUE

在其他列表中查找公式列表

1 个答案: