我正在尝试在R中构建用于图像分类的CNN模型,但由于我的列车数据很大(1.7 GB https://www.kaggle.com/c/plant-seedlings-classification/data),我试图通过所有文件读取并获取其文件大小信息一个数据框,这样我就可以从代码中的火车数据集中删除繁重的图像。以下是示例代码的片段:
#Block 1 : creating a data frame of all the subfolder and image file in them
df_trainfiles <- data.frame(ID=numeric(),foldername=character(),filename=character(),filesize=numeric(),stringsAsFactors = F)
df_testfiles<-data.frame(ID=numeric(),foldername=character(),filename=character(),filesize=numeric(),stringsAsFactors = F)
df_train<-data.frame(info=character(),stringsAsFactors = F)
df_test<-data.frame(info=character(),stringsAsFactors = F)
trainDataPath<-"C:/Users/chiragrawal/Desktop/Learning/1. Kaggle/0.2 Plant Seedlings Classification/train/train"
lsSubfolder<-list.files(path = trainDataPath,pattern = )
for (intX in 1:length(lsSubfolder)){
lsfiles<-list.files(path = paste0(trainDataPath,"/",lsSubfolder[intX]))
for(intY in 1:length(lsfiles)){
df_trainfiles[nrow(df_trainfiles)+1,]<-list(nrow(df_trainfiles)+1, lsSubfolder[intX],lsfiles[intY],file.size(paste0(trainDataPath,"/", df_trainfiles[i,2],"/", df_trainfiles[i,3],sep="")))
}
}
运行代码后查看 df_trainfiles 时,文件大小字段显示&#34; N / A&#34; 。我尝试了其他一些方法,我在其他论坛中找到了,但没有一个解决方案有效。
非常感谢您的帮助!谢谢:))
答案 0 :(得分:1)
我的建议是不使用for
循环,因为存在更强大的方法来列出文件并阅读它们的功能。
这是一个命题:
trainDataPath <- "C:/Users/chiragrawal/Desktop/Learning/1. Kaggle/0.2 Plant Seedlings Classification/train/train"
f <- list.files(path = trainDataPath, pattern = "png", recursive = TRUE, full.names=TRUE)
filename <- list.files(path = trainDataPath, pattern = "png", recursive = TRUE)
foldername <- sapply(strsplit(filename, "/"), "[", 1)
filesize <- file.size(f)
df_trainfiles <- data.frame(foldername, filename, filesize, stringsAsFactors = F)