Question

这一行为我提供了分层目录路径，直到文件：

dirs<- as.data.frame(list.dirs(path = rootdir, full.names = F, recursive = T))

像这样：

"","list.dirs(path = rootdir, full.names = F, recursive = T)"
"1",""
"2","19"
"3","19/H"
"4","19/H/BA"
"5","19/H/BA/2016"
"6","19/H/BA/2016/11"
"7","19/H/BA/2016/11/10"
"8","19/H/BA/2016/11/10/0" # files are in here
"9","19/H/BA/2016/12"
"10","19/H/BA/2016/12/20"
"11","19/H/BA/2016/12/20/0" # files are in here
"12","19/H/BA/2017"
"13","19/H/BA/2017/1"
"14","19/H/BA/2017/1/19"
"15","19/H/BA/2017/1/19/0" # files are in here
"16","19/H/BA/2017/1/29"
"17","19/H/BA/2017/1/29/0" # files are in here
"18","19/H/BA/2017/3"
"19","19/H/BA/2017/3/20"
"20","19/H/BA/2017/3/20/0" # files are in here

但是我怎么写代码只给我文件的路径？即，

"19/H/BA/2016/11/10/0"
"19/H/BA/2016/12/20/0"
"19/H/BA/2017/1/19/0"
"19/H/BA/2017/1/29/0"
"19/H/BA/2017/3/20/0"

Answer 1

您可以使用#Setting the number of folds, and number of instances in each fold n_folds <- 5 fold_size <- nrow(dataset) %/% 5 residual <- nrow(dataset) %% 5 #label the instances based on the number of folds cv_labels <- c(rep(1,fold_size),rep(2,fold_size), rep(3,fold_size), rep(4,fold_size), rep(5,fold_size), rep(5,residual)) # the error term would differ based on each threshold value t_seq <- seq(0.1,0.9,by = 0.1) index_mat <- matrix(ncol = (n_folds+1) , nrow = length(t_seq)) index_mat[,1] <- t_seq # the main loop for calculation of the CV error on each fold for (i in 1:5){ train <- dataset %>% filter(cv_labels != i) test <- dataset %>% filter(cv_labels == i ) brglm_cv_model <- brglm(formula = response_var ~ . , family = "binomial", data = train ) brglm_cv_pred <- predict(object = brglm_model, newdata = test , type = "response") # error formula that you want, e.g. misclassification counter <- 0 for (treshold in t_seq ) { counter <- counter + 1 conf_mat <- table( factor(test$response_var) , factor(brglm_cv_pred>treshold, levels = c("FALSE","TRUE") )) sen <- conf_mat[2,2]/sum(conf_mat[2,]) # other indices can be computed as follows #spec <- conf_mat[1,1]/sum(conf_mat[1,]) #prec <- conf_mat[2,2]/sum(conf_mat[,2]) #F1 <- (2*prec * sen)/(prec+sen) #accuracy <- (conf_mat[1,1]+conf_mat[2,2])/sum(conf_mat) #here I am only interested in sensitivity index_mat[counter,(i+1)] <- sen } } # final data.frame would be the mean of sensitivity over each threshold value final_mat <- matrix(nrow = length(t_seq), ncol = 2 ) final_mat[,1] <- t_seq final_mat[,2] <- apply(X = index_mat[,-1] , MARGIN = 1 , FUN = mean) final_mat <- data.frame(final_mat) colnames(final_mat) <- c("treshold","sensitivity") #why not having a look at the CV-sensitivity of the model over threshold values? ggplot(data = final_mat) + geom_line(aes(x = treshold, y = sensitivity ), color = "blue")代替正则表达式，这将处理dirname或rootdir == "C:/"等特殊情况：

rootdir == "../"

Answer 2

我们可以使用list.files来获取所有存在文件的路径（这样它就不会给我们任何空的目录路径）。

filepath = list.files(rootdir, recursive = T)

现在这将包含所有文件的路径，我们可以使用sub从中删除文件名并仅保留目录名。

sub("[/].*", "", filepath)

这会删除/中的所有内容。最后为了避免重复，我们可以unique。

在一个班轮里做所有事情。

unique(sub("[/].*", "", list.files(rootdir, recursive = T)))

列出最多只有我文件级别的目录

2 个答案: