更快的方法R while循环最小二乘函数

时间:2012-04-23 15:24:37

标签: r

我正在尝试加速下面的函数(用于以后的自举),该函数执行x和y都有误差的直线的最小二乘拟合。我认为主要的挂起是在while循环中。该函数的输入值是观察xy以及这些值sxsy中的绝对不确定性。

york <- function(x, y, sx, sy){

    x <- cbind(x)
    y <- cbind(y)

    # initial least squares regression estimation
    fit <- lm(y ~ x)
    a1 <- as.numeric(fit$coefficients[1])   # intercept
    b1 <- as.numeric(fit$coefficients[2])   # slope
    e1 <- cbind(as.numeric(fit$residuals))  # residuals
    theta.fit <- rbind(a1, b1)

    # constants
    rho.xy <- 0     # correlation between x and y

    # initialize york regression
    X <- cbind(1, x)
    a <- a1
    b <- b1
    tol <- 1e-15    # tolerance
    d <- tol
    i = 0

    # york regression
    while (d > tol || d == tol){
        i <- i + 1
        a2 <- a
        b2 <- b
        theta2 <- rbind(a2, b2)
        e <- y - X %*% theta2
        w <- 1 / sqrt((sy^2) + (b2^2 * sx^2) - (2 * b2 * sx * sy * rho.xy))
        W <- diag(w)
        theta <- solve(t(X) %*% (W %*% W) %*% X) %*% t(X) %*% (W %*% W) %*% y

        a <- theta[1]
        b <- theta[2]

        mswd <- (t(e) %*% (W%*%W) %*% e)/(length(x) - 2)
        sfit <- sqrt(mswd)
        Vo <- solve(t(X) %*% (W %*% W) %*% X)
        dif <- b - b2
        d <- abs(dif)
        }

    # format results to data.frame
    th <- data.frame(a, b)
    names(th) <- c("intercept", "slope")
    ft <- data.frame(mswd, sfit)
    names(ft) <- c("mswd", "sfit")
    df <- data.frame(x, y, sx, sy, as.vector(e), diag(W))
    names(df) <- c("x", "y", "sx", "sy", "e", "W")

    # store output results
    list(coefficients = th,
        vcov = Vo,
        fit = ft,
        df = df)
}

1 个答案:

答案 0 :(得分:3)

只需进行一些简单的更改即可加快您的功能。首先,你应该移动while循环中不需要的东西。例如,您对同一数据运行solve两次。此外,当您仅在while循环的最后一次迭代中使用它时,您在每次迭代时计算sfit

这是我的代码:

york.fast <- function(x, y, sx, sy, tol=1e-15){
    # initial least squares regression estimation
    fit <- lm(y ~ x)
    theta <- fit$coefficients
    # initialize york regression
    X <- cbind(1, x)
    d <- tol
    # york regression
    while (d >= tol){
        b2 <- theta[2]
        # w <- 1 / sqrt((sy^2) + (b2^2 * sx^2) - (2 * b2 * sx * sy * rho.xy)) # rho.xy is always zero!
        w <- 1 / sqrt(sy^2 + (b2^2 * sx^2))  # rho.xy is always zero!
        # W <- diag(w)
        # w2 <- W %*% W
        w2 <- diag(w^2) # As suggested in the comments.
        base <- crossprod(X,w2)
        Vo <- solve(base %*% X)
        theta <- Vo %*% base %*% y
        d <- abs(theta[2] - b2)
     }
     e <- y - X %*% theta
     mswd <- (crossprod(e,w2) %*% e) / (length(x) - 2)
     sfit <- sqrt(mswd)

    # format results to data.frame
    th <- data.frame(intercept=theta[1], slope=theta[2])
    ft <- data.frame(mswd=mswd, sfit=sfit)
    df <- data.frame(x=x, y=y, sx=sx, sy=sy, e=as.vector(e), W=diag(diag(w)))

    # store output results
    list(coefficients = th, vcov = Vo, fit = ft, df = df)
}

一点点测试:

n=225
set.seed(1)
x=rnorm(n)
y=rnorm(n)
sx=rnorm(n)
sy=rnorm(n)

system.time(test<-york.fast(x,y,sx,sy)) # 0.37 s
system.time(gold<-york(x,y,sx,sy)) # 1.28 s

我注意到rho.xy始终固定为零。这可能是个错误吗?

我也注意到您经常使用cbindvector转换为带有一列的matrix。所有向量都自动被视为具有一列的矩阵,因此您可以避免大量额外的代码。

正如@joran所提到的,容差水平设置得很小,以至于需要很长时间才能收敛;考虑使用更大的容差。