这一行为我提供了分层目录路径,直到文件:
dirs<- as.data.frame(list.dirs(path = rootdir, full.names = F, recursive = T))
像这样:
"","list.dirs(path = rootdir, full.names = F, recursive = T)"
"1",""
"2","19"
"3","19/H"
"4","19/H/BA"
"5","19/H/BA/2016"
"6","19/H/BA/2016/11"
"7","19/H/BA/2016/11/10"
"8","19/H/BA/2016/11/10/0" # files are in here
"9","19/H/BA/2016/12"
"10","19/H/BA/2016/12/20"
"11","19/H/BA/2016/12/20/0" # files are in here
"12","19/H/BA/2017"
"13","19/H/BA/2017/1"
"14","19/H/BA/2017/1/19"
"15","19/H/BA/2017/1/19/0" # files are in here
"16","19/H/BA/2017/1/29"
"17","19/H/BA/2017/1/29/0" # files are in here
"18","19/H/BA/2017/3"
"19","19/H/BA/2017/3/20"
"20","19/H/BA/2017/3/20/0" # files are in here
但是我怎么写代码只给我文件的路径?即,
"19/H/BA/2016/11/10/0"
"19/H/BA/2016/12/20/0"
"19/H/BA/2017/1/19/0"
"19/H/BA/2017/1/29/0"
"19/H/BA/2017/3/20/0"
答案 0 :(得分:3)
您可以使用#Setting the number of folds, and number of instances in each fold
n_folds <- 5
fold_size <- nrow(dataset) %/% 5
residual <- nrow(dataset) %% 5
#label the instances based on the number of folds
cv_labels <- c(rep(1,fold_size),rep(2,fold_size), rep(3,fold_size), rep(4,fold_size), rep(5,fold_size), rep(5,residual))
# the error term would differ based on each threshold value
t_seq <- seq(0.1,0.9,by = 0.1)
index_mat <- matrix(ncol = (n_folds+1) , nrow = length(t_seq))
index_mat[,1] <- t_seq
# the main loop for calculation of the CV error on each fold
for (i in 1:5){
train <- dataset %>% filter(cv_labels != i)
test <- dataset %>% filter(cv_labels == i )
brglm_cv_model <- brglm(formula = response_var ~ . , family = "binomial", data = train )
brglm_cv_pred <- predict(object = brglm_model, newdata = test , type = "response")
# error formula that you want, e.g. misclassification
counter <- 0
for (treshold in t_seq ) {
counter <- counter + 1
conf_mat <- table( factor(test$response_var) , factor(brglm_cv_pred>treshold, levels = c("FALSE","TRUE") ))
sen <- conf_mat[2,2]/sum(conf_mat[2,])
# other indices can be computed as follows
#spec <- conf_mat[1,1]/sum(conf_mat[1,])
#prec <- conf_mat[2,2]/sum(conf_mat[,2])
#F1 <- (2*prec * sen)/(prec+sen)
#accuracy <- (conf_mat[1,1]+conf_mat[2,2])/sum(conf_mat)
#here I am only interested in sensitivity
index_mat[counter,(i+1)] <- sen
}
}
# final data.frame would be the mean of sensitivity over each threshold value
final_mat <- matrix(nrow = length(t_seq), ncol = 2 )
final_mat[,1] <- t_seq
final_mat[,2] <- apply(X = index_mat[,-1] , MARGIN = 1 , FUN = mean)
final_mat <- data.frame(final_mat)
colnames(final_mat) <- c("treshold","sensitivity")
#why not having a look at the CV-sensitivity of the model over threshold values?
ggplot(data = final_mat) +
geom_line(aes(x = treshold, y = sensitivity ), color = "blue")
代替正则表达式,这将处理dirname
或rootdir == "C:/"
等特殊情况:
rootdir == "../"
答案 1 :(得分:1)
我们可以使用list.files
来获取所有存在文件的路径(这样它就不会给我们任何空的目录路径)。
filepath = list.files(rootdir, recursive = T)
现在这将包含所有文件的路径,我们可以使用sub
从中删除文件名并仅保留目录名。
sub("[/].*", "", filepath)
这会删除/
中的所有内容。最后为了避免重复,我们可以unique
。
在一个班轮里做所有事情。
unique(sub("[/].*", "", list.files(rootdir, recursive = T)))