我正在努力避免重复代码来循环两组文件('yes'和'no'模型训练文件),所以我将两个文件名向量组合成一个data.frame以及一个额外的位用于跟踪文件是“是”文件还是“否”文件的元数据。结果数据结构看起来正确,但后来我无法弄清楚如何循环data.frame。
也许最好的解决方案是将两个向量组合成不同类型的数据结构(即不是data.frame)?
> yesFiles = c("yFile1", "yFile2", "yFile3", "yFile4")
> noFiles = c("nFile1", "nFile2", "nFile3", "nFile4")
> allFiles = data.frame(result=c(rep("yes", times=length(yesFiles)), rep("no", times=length(noFiles))), name=c(yesFiles, noFiles))
> allFiles
result name
1 yes yFile1
2 yes yFile2
3 yes yFile3
4 yes yFile4
5 no nFile1
6 no nFile2
7 no nFile3
8 no nFile4
>
> for (file in allFiles) { cat(sep="", file$result, ": ", file$name, "\n") }
Error in file$result : $ operator is invalid for atomic vectors
>
> for (file in allFiles) { cat(sep="", file['result'], ": ", file['name'], "\n") }
NA: NA
NA: NA
>
循环似乎是循环遍历列,而不是行。如何让它循环遍历行?或者是否有更好的方法来组合数据以允许在单个循环中循环它们?
然后我尝试以不同的方式循环同一个结构,但仍然无效...
> yesFiles = c("yFile1", "yFile2", "yFile3", "yFile4")
> noFiles = c("nFile1", "nFile2", "nFile3", "nFile4")
> allFiles = data.frame(result=c(rep("yes", times=length(yesFiles)), rep("no", times=length(noFiles))), name=c(yesFiles, noFiles))
> allFiles
result name
1 yes yFile1
2 yes yFile2
3 yes yFile3
4 yes yFile4
5 no nFile1
6 no nFile2
7 no nFile3
8 no nFile4
>
> allFiles[1,1]
[1] yes
Levels: no yes
> allFiles[1,2]
[1] yFile1
Levels: nFile1 nFile2 nFile3 nFile4 yFile1 yFile2 yFile3 yFile4
> # ...ah, great! These seem to be giving me what I need.
>
> for (i in 1:nrow(allFiles)) {
+ result = allFiles[i,1]
+ file = allFiles[i,2]
+ cat(sep="", "File '", file, "' is a '", result, "' file.\n")
+ }
File '5' is a '2' file.
File '6' is a '2' file.
File '7' is a '2' file.
File '8' is a '2' file.
File '1' is a '1' file.
File '2' is a '1' file.
File '3' is a '1' file.
File '4' is a '1' file.
> # ...wha? What's up with the numbers? I thought [1,1], etc, gave strings!
我做错了什么?
以下是关于我需要在循环中实际执行的内容的其他信息 ,'Colonel Beauvel' < < / strong>在他的回答下面的评论.....
首先,我需要一个实用程序函数来转换.csv文件的每一行上的文本时间戳:
#-----------------------------------------------
# Read a text timestamp of the form "yyyy-mm-ddThh:mm:ss.xxx",
# where xxx=milliseconds. Returns a numeric value of the seconds
# since Jan 1 1970, with millisecond precision (i.e. 3 decimal places).
#
readTimestamp = function (tstamp) {
as.numeric(strptime(tstamp,format='%Y-%m-%dT%H:%M:%S.')) +
as.numeric(substr(tstamp,20,23))
}
现在,我想要运行的循环(代码尚未调试,所以我确定它有问题):
colnamesToKeep = union("Seconds", sensorNamesForThisModel)
dataset = list() # Eventually 'dataset' will hold all training data from all files
for (file in allFiles)
{
cat(sep="", "Reading '", file['result'], "' file \"", file['name'], "\".\n")
tmp = read.csv(file['name'], na.strings=c(".", "NA", "", "?"), strip.white=TRUE, encoding="UTF-8")
attr(tmp, "names")[1] = "Seconds" # Rename column 1 to "Seconds" (it's not yet, but it will be)
tmp = tmp[,-2:-4] # Delete these columns; they're irrelevant to the KSVM model
beginTime = readTimestamp(tmp[1,1])
# Convert column 1 from text timestamps to numeric seconds (msec precision) starting at 0.000
tmp[,1] = readTimestamp(tmp[,1]) - beginTime
# Delete all columns for sensors that this model cares nothing about...
colIndicesToDelete = -which(!(colnames(tmp) %in% colnamesToKeep))
tmp = tmp[,colIndicesToDelete] # Delete all columns for sensors that this model cares nothing about
dataset[[length(dataset)+1]] = list(result=file['result'], data=tmp) # Add this to the training dataset
}
我对任何&amp;所有建议,尤其是“您不应该使用union()
创建colnamesToKeep
变量”。非常感谢你!
答案 0 :(得分:0)
我想出来了,如下所示。但我仍然非常愿意接受有关更好的方式的建议。
> yesFiles = c("yFile1", "yFile2", "yFile3", "yFile4")
> noFiles = c("nFile1", "nFile2", "nFile3", "nFile4")
> allFiles = data.frame(result=c(rep("yes", times=length(yesFiles)), rep("no", times=length(noFiles))), name=c(yesFiles, noFiles))
> allFiles
result name
1 yes yFile1
2 yes yFile2
3 yes yFile3
4 yes yFile4
5 no nFile1
6 no nFile2
7 no nFile3
8 no nFile4
>
>
>
>
>
> for (i in 1:nrow(allFiles)) {
+ result = as.character(allFiles[[i,1]])
+ file = as.character(allFiles[[i,2]])
+ cat(sep="", "File '", file, "' is a '", result, "' file.\n")
+ }
File 'yFile1' is a 'yes' file.
File 'yFile2' is a 'yes' file.
File 'yFile3' is a 'yes' file.
File 'yFile4' is a 'yes' file.
File 'nFile1' is a 'no' file.
File 'nFile2' is a 'no' file.
File 'nFile3' is a 'no' file.
File 'nFile4' is a 'no' file.
>
答案 1 :(得分:0)
尝试使用长度为(文件)的索引i,并在您对数据框进行循环后使用它来对数据框进行子集化。您可以使用df $ column [i]:
提取列的值yesFiles = c("yFile1", "yFile2", "yFile3", "yFile4")
noFiles = c("nFile1", "nFile2", "nFile3", "nFile4")
files = data.frame(result=c(rep("yes", times=length(yesFiles)), rep("no", times=length(noFiles))),
name=c(yesFiles, noFiles),
stringsAsFactors=FALSE)
files
for (i in 1:length(files$name)) {
cat(sep="", files$result[i], ": ", files$name[i], "\n")
# Do other stuff here, the filepath is available via files$result[i]
}
>yes: yFile1
>yes: yFile2
>yes: yFile3
>yes: yFile4
>no: nFile1
>no: nFile2
>no: nFile3
>no: nFile4