R中需要TRUE / FALSE的缺失值

时间:2014-04-22 02:39:07

标签: r function statistics regression gradient-descent

当我运行以下代码而没有评论gr.ascent(MMSE, 0.5, verbose=TRUE)时,我收到此错误Error in b1 * x : 'b1' is missing但是当我评论该行时,我在使用这些参数MMSE(2,1,farmland$farm,farmland$area)测试MMSE时收到以下错误。你知道我的问题在哪里吗?

Error in if (abs(t[i]) <= k) { : missing value where TRUE/FALSE needed

这是我的代码:

farmland <- read.csv("FarmLandArea.csv")
str(farmland)
fit=lm(farm~land,data=farmland)
mean.squared.residuals <- sum((lm(farm~land,data=farmland)$residuals)^2)/(length(farmland$farm)-2)

#gradient descent

#things I should possibly use: solve(t(X)%*%X, t(X)%*%y)
gr.ascent<- function(df, x0, alpha=0.2, eps=0.001, max.it = 50, verbose = FALSE){
  X1 <- x0
  cond <- TRUE
  iteration <- 0
  if(verbose) cat("X0 =",X1,"\n")
  while(cond){
    iteration <- iteration + 1
    X0 <- X1
    X1 <- X0 + alpha * df(X0)
    cond <- sum((X1 - X0)^2) > eps & iteration < max.it
    if(verbose) cat(paste(sep="","X",iteration," ="), X1, "\n")
  }
  return(X1)
}


k=19000

#rho <- function(t, k=19000){
#  for (i in seq(1,length(t))){
#    if (abs(t[i]) <= k)
#      return(t[i]^2)
#    else 
#      return(2*k*abs(t[i])-k^2)

#  }

#}

#nicer implementation of rho. ifelse works on vector
rho<-function(t,k) ifelse(abs(t)<=k,t^2,(2*k*abs(t))-k^2)
rho.prime <- function(t, k=19000){
  out <- rep(NA, length(t))
  for (i in seq(1,length(t))){
    if (abs(t[i]) <= k)
    { print(2*t[i])
      out[i] <- 2*t[i] 
    }
    else 
    {
      print(2*k*sign(t[i]))
      out[i] <- 2*k*sign(t[i])
    }
  }
  return(out)
}
MMSE <- function(b0, b1, y=farmland$farm, x=farmland$land){
   # Calls rho.prime() here with argument y-b0-b1*x


   #Why should we call rho.prime? in the html page you have used rho!?
  n = length(y)
  total = 0
  for (i in seq(1,n)) {
    #total = total + rho(t,k)*(y[i]-b0-b1*x[i])
    total = total + rho.prime(y-b0-b1*x,k)*(y[i]-b0-b1*x[i])
  }
  return(total/n)
}

gr.ascent(MMSE(1,2), 0.5, verbose=TRUE)

其中FarmLand csv数据如下所示:

state,land,farm
Alabama,50744,14062
Alaska,567400,1375
Arizona,113635,40781
Arkansas,52068,21406
California,155959,39688
Colorado,103718,48750
Connecticut,4845,625
Delaware,1954,766
Florida,53927,14453
Georgia,57906,16094
Hawaii,6423,1734
Idaho,82747,17812
Illinois,55584,41719
Indiana,35867,23125
Iowa,55869,48125
Kansas,81815,72188
Kentucky,39728,21875
Louisiana,43562,12578
Maine,30862,2109
Maryland,9774,3203
Massachusetts,7840,812
Michigan,58110,15625
Minnesota,79610,42031
Mississippi,46907,17422
...

这是dput(农田)的结果:

> dput(farmland)
structure(list(state = structure(1:50, .Label = c("Alabama", 
"Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", 
"Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", 
"Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", 
"Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", 
"New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", 
"Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", 
"South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", 
"Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", 
"Wyoming"), class = "factor"), land = c(50744L, 567400L, 113635L, 
52068L, 155959L, 103718L, 4845L, 1954L, 53927L, 57906L, 6423L, 
82747L, 55584L, 35867L, 55869L, 81815L, 39728L, 43562L, 30862L, 
9774L, 7840L, 58110L, 79610L, 46907L, 68886L, 145552L, 76872L, 
109826L, 8968L, 7417L, 121356L, 47214L, 48711L, 68976L, 40948L, 
68667L, 95997L, 44817L, 1045L, 30109L, 75885L, 41217L, 261797L, 
82144L, 9250L, 39594L, 66544L, 24230L, 54310L, 97105L), farm = c(14062L, 
1375L, 40781L, 21406L, 39688L, 48750L, 625L, 766L, 14453L, 16094L, 
1734L, 17812L, 41719L, 23125L, 48125L, 72188L, 21875L, 12578L, 
2109L, 3203L, 812L, 15625L, 42031L, 17422L, 45469L, 95000L, 71250L, 
9219L, 734L, 1141L, 67500L, 10938L, 13438L, 61875L, 21406L, 55000L, 
25625L, 12109L, 109L, 7656L, 68281L, 17031L, 203750L, 17344L, 
1906L, 12578L, 23125L, 5703L, 23750L, 47188L)), .Names = c("state", 
"land", "farm"), class = "data.frame", row.names = c(NA, -50L
))

1 个答案:

答案 0 :(得分:2)

好的,数字:

  1. 在致电gr.ascent(...)时,您传递函数MMSE作为第一个参数。在gr.ascent(...)内,您将此功能称为df(...)
  2. 函数MMSE(...)有2个参数,b0b1,没有默认值 - 所以必须指定这些参数或者会出现错误,但是
  3. 当你在df(...)内调用gr.ascent(...)函数时,在行X1 <- X0 + alpha * df(X0)中只传递一个参数,即b0
  4. 因此第二个参数b1缺失,因此错误。
  5. 直接致电MMSE(...),如:

    MMSE(2,1,farmland$farm,farmland$area)
    

    你传递farmland$area作为第四个参数。但是area数据框中没有列farmland !因此,这将作为NA传递,在

    中使用时
    total = total + rho.prime(y-b0-b1*x,k)*(y[i]-b0-b1*x[i])
    

    t参数强制转换为rho.prime(...)NA,因此出现第二个错误。

    我无法提出解决方案,因为我不知道你想在这里完成什么。

    编辑(对OP&#39评论的回应)。

    尽管@ thelatemail的评论,我完全同意,但你的新错误却相当模糊。

    在您的早期版本中,您将功能MSEE(...)传递给gr.ascent(...),并且错误地使用了它。这次,您将传递给gr.ascent(...),该值是您致电MSEE(1,2)时的返回值。那么当您尝试将此视为函数时会发生什么,如:

    X1 <- X0 + alpha * df(X0)
    

    好吧,通常这会抛出一个错误,告诉你df不是一个函数。在这种情况下,df 是一个功能,这只是你的运气不好。它是F分布的概率密度函数,其具有必需参数df1,其中包括(类型?df以查看文档)。这就是你收到错误的原因。

    To&#34; fix&#34;你需要回到传递函数,如:

    gr.ascent(MSEE,...)
    

    然后在gr.ascent(...)内正确使用它,如:

    X1 <- X0 + alpha * df(X0, <some other argument>).