nrow(X)中的错误:找不到对象“ X”,但已定义

时间:2019-07-15 08:37:06

标签: r

我正在尝试实现逻辑回归,并且该函数手动起作用,但是由于某种原因,即使X在nrow命令之前定义,我也会收到错误“ nrow(X)中的错误:找不到对象'X'”。我使用UCI数据“成人”进行测试。

如果我尝试手动运行该功能,则没有错误。谁能解释一下?

# Sigmoidfunction
sigmoid <- function(z){
  g <- 1/(1+exp(-z))
  return(g)
}

# Costfunction
cost <- function(theta){
  n <- nrow(X)
  g <- sigmoid(X %*% theta)
  J <- (1/n)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
  return(J)
}

log_reg <- function(datafr, m){

  # Train- und Testdaten Split
  sample <- sample(1:nrow(datafr), m)
  df_train <- datafr[sample,]
  df_test <- datafr[-sample,]

  num_features <- ncol(datafr) - 1
  num_label <- ncol(datafr)
  label_levels <- levels(datafr[, num_label])
  datafr[, num_features+1] <- ifelse(datafr[, num_label] == names(table(datafr[,num_label]))[1], 0, 1)

  # Predictor variables
  X <- as.matrix(df_train[, 1:num_features])
  X_test <- as.matrix(df_test[, 1:num_features])

  # Add ones to X
  X <- cbind(rep(1, nrow(X)), X)
  X_test <- cbind(rep(1, nrow(X_test)), X_test)

  # Response variable
  Y <- as.matrix(df_train[, num_label] )
  Y <- ifelse(Y == names(table(Y))[1], 0, 1)

  Y_test <- as.matrix(df_test[, num_label] )
  Y_test <- ifelse(Y_test == names(table(Y_test))[1], 0, 1)


  # Intial theta
  initial_theta <- rep(0, ncol(X))

  # Derive theta using gradient descent using optim function
  theta_optim <- optim(par=initial_theta, fn=cost)

  predictions <- ifelse(sigmoid(X_test%*%theta_optim$par)>=0.5, 1, 0)


# Generalization error
error_rate <- sum(predictions!=Y_test)/length(Y_test)

return(error_rate)
}

### Adult Data
data <- read.table('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', 
                    sep = ',', fill = F, strip.white = T)
colnames(data) <- c('age', 'workclass', 'fnlwgt', 'education', 
                    'education_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 
                    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income')

# Featureselection
datafr <- data[, c("age", "education_num", "hours_per_week", "income")]

log_reg(datafr = datafr, m = 20)

1 个答案:

答案 0 :(得分:0)

您正在呼叫cost(),其中引用了X,但是X中尚未定义cost()。在定义log_reg()之后,可以在X中定义它,或者更好的是,将X设为cost()的参数。

cost <- function(theta, X, Y){
  n <- nrow(X)
  g <- sigmoid(X %*% theta)
  J <- (1/n)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
  return(J)
}

后来

theta_optim <- optim(par=initial_theta, fn=cost, X=X, Y=Y)

通常,请尝试避免在函数中使用未明确定义为该函数参数的变量。否则,您总是会遇到类似这样的问题。

此外,我是怎么找到它的?我使用了traceback()

> traceback()
5: nrow(X) at #2
4: fn(par, ...)
3: (function (par) 
   fn(par, ...))(c(0, 0, 0, 0))
2: optim(par = initial_theta, fn = cost) at #33
1: log_reg(datafr = datafr, m = 20)