我正在尝试实现逻辑回归,并且该函数手动起作用,但是由于某种原因,即使X在nrow命令之前定义,我也会收到错误“ nrow(X)中的错误:找不到对象'X'”。我使用UCI数据“成人”进行测试。
如果我尝试手动运行该功能,则没有错误。谁能解释一下?
# Sigmoidfunction
sigmoid <- function(z){
g <- 1/(1+exp(-z))
return(g)
}
# Costfunction
cost <- function(theta){
n <- nrow(X)
g <- sigmoid(X %*% theta)
J <- (1/n)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
return(J)
}
log_reg <- function(datafr, m){
# Train- und Testdaten Split
sample <- sample(1:nrow(datafr), m)
df_train <- datafr[sample,]
df_test <- datafr[-sample,]
num_features <- ncol(datafr) - 1
num_label <- ncol(datafr)
label_levels <- levels(datafr[, num_label])
datafr[, num_features+1] <- ifelse(datafr[, num_label] == names(table(datafr[,num_label]))[1], 0, 1)
# Predictor variables
X <- as.matrix(df_train[, 1:num_features])
X_test <- as.matrix(df_test[, 1:num_features])
# Add ones to X
X <- cbind(rep(1, nrow(X)), X)
X_test <- cbind(rep(1, nrow(X_test)), X_test)
# Response variable
Y <- as.matrix(df_train[, num_label] )
Y <- ifelse(Y == names(table(Y))[1], 0, 1)
Y_test <- as.matrix(df_test[, num_label] )
Y_test <- ifelse(Y_test == names(table(Y_test))[1], 0, 1)
# Intial theta
initial_theta <- rep(0, ncol(X))
# Derive theta using gradient descent using optim function
theta_optim <- optim(par=initial_theta, fn=cost)
predictions <- ifelse(sigmoid(X_test%*%theta_optim$par)>=0.5, 1, 0)
# Generalization error
error_rate <- sum(predictions!=Y_test)/length(Y_test)
return(error_rate)
}
### Adult Data
data <- read.table('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',
sep = ',', fill = F, strip.white = T)
colnames(data) <- c('age', 'workclass', 'fnlwgt', 'education',
'education_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex',
'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income')
# Featureselection
datafr <- data[, c("age", "education_num", "hours_per_week", "income")]
log_reg(datafr = datafr, m = 20)
答案 0 :(得分:0)
您正在呼叫cost()
,其中引用了X
,但是X
中尚未定义cost()
。在定义log_reg()
之后,可以在X
中定义它,或者更好的是,将X
设为cost()
的参数。
cost <- function(theta, X, Y){
n <- nrow(X)
g <- sigmoid(X %*% theta)
J <- (1/n)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
return(J)
}
后来
theta_optim <- optim(par=initial_theta, fn=cost, X=X, Y=Y)
通常,请尝试避免在函数中使用未明确定义为该函数参数的变量。否则,您总是会遇到类似这样的问题。
此外,我是怎么找到它的?我使用了traceback()
:
> traceback()
5: nrow(X) at #2
4: fn(par, ...)
3: (function (par)
fn(par, ...))(c(0, 0, 0, 0))
2: optim(par = initial_theta, fn = cost) at #33
1: log_reg(datafr = datafr, m = 20)