编辑：

Question

假设您正在处理回归模型和至少一个预测变量通过样条估计，例如，

library(splines)
data(diamonds, package = "ggplot2")

fit <- lm(price ~ bs(depth, degree = 5) + bs(carat, knots = c(2, 3)) * color, 
          data = diamonds)

以上拟合用于说明目的，没有任何有意义的理由是

现在，让我们保持相同的基本公式，但更改两者的结点位置深度和克拉。更新需要以动态的方式进行可能是更大的MCMC方法的一部分（结和结点的数量通过可逆跳跃或出生/死亡步骤确定。

我很清楚update和update.formula来电，但我不相信这些工具会有所帮助。以下伪代码应说明我计划开发的函数的行为。

foo <- function(formula, data) { 

  # Original Model matrix, the formula will be of the form:
  Xmat_orig <- model.matrix(formula, data)

  # some fancy method for selecting new knot locations here
  # lots of cool R code....

  # pseudo code for the 'new knots'.  In the example formula above var1 would be
  # depth and var2 would be carat.  The number of elements in this list would be
  # dependent on the formula passed into foo.
  new_knots <- list(k1 = knot_locations_for_var1, 
                    k2 = knot_locations_for_var2)

  # updated model matrix: 
  # pseudo code for that the new model matrix call would look like.
  Xmat_new <- 
    model.matrix(y ~ bs(var1, degree = 5, knots = new_knots$k1) + bs(var2, knots = new_knots$k2) * color, 
                 data = data)

  return(Xmat_new) 
}

有人可以建议在knots或bs内修改ns来电 {{1}}动态调用？

Answer 1

您可以在R中使用substitute函数，其中：

替代（expr，env） substitute返回（未评估的）表达式expr的解析树，替换env中绑定的任何变量。

例如：

> rm(list=ls())
> x <- 1
> x + y
Error: object 'y' not found

因为未定义y。现在使用substitute：

> (expr <- substitute(x + y, list(y=2)))
x + 2
> eval(expr)
[1] 3
> z <- 2
> (expr <- substitute(x + y, list(y=z)))
x + 2
> eval(expr)
[1] 3

在你的例子中：

f1 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color, 
                       list(deg=5, knts=c(2, 3))))
f2 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color,
                       list(deg=6, knts=c(3, 4))))

fit1 <- lm(f1, data=diamonds)
fit2 <- lm(f2, data=diamonds)

通常，您可以编写一个包含substitute调用的函数，例如：

formula.with.knots <- function(degree, knots) {
  f.expr <- substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color, 
                        list(deg=degree, knts=knots))

  eval(f.expr)
}

f <- formula.with.knots(5, c(2, 3))
fit <- lm(f, data = diamonds)
summary(fit)

Answer 2

这里的另一种可能性并不是对你的功能输入的挑剔。考虑一下这个

newknots <- function(form, data, calls=c("bs","ns")) {
    nk <- function(x) { 
        sort(runif(sample(1:5, 1), min = min(data[[x]]), max = max(data[[x]])))
    }
    rr <- function(x, nk, calls) {
        if(is.call(x) && deparse(x[[1]]) %in% calls) {
            x$knots = nk(deparse(x[[2]]))
            x
        } else if (is.recursive(x)) {
            as.call(lapply(as.list(x), rr, nk, calls))
        } else {
            x
        }
    }
    z <- lapply(as.list(form), rr, nk, calls)   
    z <- eval(as.call(z))
    environment(z) <- environment(form)
    z
}

所以这不是一个微不足道的功能，但希望它不是太糟糕。我们的想法是，我们可以将公式转换为可以递归调查的列表对象。这就是内部rr函数正在做的事情。它需要一个列表，然后查看每个元素。它会查找对bs或ns的来电，当找到它们时，它会替换knots=参数。

这里我们使用kn函数为给定的变量名创建一组新结，并以字符串形式传入。我们只需要返回一个适合该变量的值列表。

最后我需要将列表转回公式，并确保我们的新对象与原始公式具有相同的环境。所以这实际上确实创建了一个新的公式对象，保留了原始对象（如果愿意，可以替换原始值）。

以下是您如何调用/使用此功能的示例。

f <- price ~ ns(carat, knots=c(1,3)) * color + bs(depth, degree = 5) + clarity
newknots(f, diamonds)

# price ~ ns(carat, knots = c(2.09726121873362, 3.94607368792873
# )) * color + bs(depth, degree = 5, knots = c(44.047089480795, 
# 47.8856966942549, 49.7632855847478, 70.9297015387565)) + clarity

所以你可以看到根据我们的新规则添加和替换结。我不确定您可能需要哪些其他功能，但希望这会给您一个很好的起点。

Answer 3

公式都绑定到环境。因此，一个选项是分别使用您可能想要更改的参数的变量创建公式，并在函数的环境中分配这些变量值。

f <- as.formula("price ~ bs(depth, knots=d_knots) + bs(carat, knots=c_knots) * color", 
                list2env(list(d_knots=c(2,3), c_knots=c(3,2))))

我已为d_knots和c_knots定义了两个默认值。然后修改这些值：

environment(f)$d_knots <- c(2,3)
environment(f)$c_knots <- c(3, 2)

然后，您可以将公式提供给建模函数

fit <- lm(f, data=diamonds)

Answer 4

编辑：

谢谢@MrFlick，您的解决方案正是我所寻求的。

#original post

感谢@MrFlick和@hadley，他们在SO和Twitter上的回复帮助我找到了一个有效的解决方案。这种方法需要改进，但似乎可以满足我的直接需求。

下面定义的函数with_new_knots将解析a formula并通过terms修改元素。（我还要感谢survival软件包的作者Terry Therneau，当我挖掘代码时，看看当公式中包含strata等函数时公式是如何被操纵的。）我已经可以认为这个函数失败的用例，但重要的是该方法的轮廓存在，我可以在以后扩展和改进。

library(ggplot2)
library(reshape2)
library(dplyr)
library(magrittr)
library(splines)
set.seed(42)

with_new_knots <- function(frm, data, iterations = 5L) { 
  # extract the original formula
  old_terms   <- terms(frm, specials = c("bs", "ns"))

  # reconstruct the rhs of the formula with any interaction terms expanded
  cln     <- colnames(attr(old_terms, "factors")) 
  old_rhs <- paste(cln, collapse = " + ")

  # Extract the spline terms from the old_formula 
  idx              <- attr(old_terms, "specials") %>% unlist   %>% sort
  old_spline_terms <- attr(old_terms, "factors")  %>% rownames %>% extract(idx)

  # grab the variable names which splines are built on
  vars <- all.vars(frm)[idx]

  # define the range for each variable in vars
  rngs <- lapply(vars, function(x) { range(data[, x]) })

  # for each of the spline terms, randomly generate new knots
  # This is a silly example, something clever will replace it. 

  out <- replicate(iterations, 
                   {
                     new_knots <- lapply(rngs, function(r) { 
                                         kts <- sort(runif(sample(1:5, 1), min = r[1], max = r[2]))
                                         paste0("c(", paste(kts, collapse = ", "), ")")
                             })

                     new_spline_terms <- 
                       mapply(FUN = function(s, k) { sub(")$", paste0(", knots = ", k, ")"), s) },
                              s = old_spline_terms,
                              k = new_knots)

                     rhs <- old_rhs
                     for(i in 1:length(old_spline_terms)) { 
                       rhs <- gsub(old_spline_terms[i], new_spline_terms[i], rhs, fixed = TRUE)
                     }

                     f <- as.formula(paste(rownames(attr(old_terms, "factors"))[1], "~", rhs))
                     environment(f) <- environment(frm)
                     return(f)
                   }, 
                   simplify = FALSE) 
  return(out) 
}

使用示例：

此处通过with_new_knots呈现并修改了统计无意义的模型以说明结果，更新了一个formula对象，以便更新公式中的spline次调用。

f <- price ~ ns(carat) * color + bs(depth, degree = 5) + clarity
with_new_knots(f, diamonds)


orig_fit <- predict(lm(f, data = diamonds))
new_fits <- with_new_knots(f, diamonds) %>%
            lapply(., function(frm) { predict(lm(frm, data = diamonds)) })

dat <- data.frame(orig_fit, new_fits)
names(dat)[2:6] <- paste("new knots", 1:5)
dat <- melt(dat, id.vars = NULL)
dat <- cbind(dat, diamonds)

ggplot(dat) + 
aes(x = carat, y = value, color = color, shape = clarity) + 
geom_line() + 
geom_point(aes(y = price), alpha = 0.1) + 
facet_wrap( ~ variable, scale = "free")

Illustration of the different models with different knots

修改公式中的函数调用

4 个答案:

编辑：

#original post

使用示例：