试图创建一个'宏'/提取到一个新的data.frame

时间:2013-05-30 22:54:17

标签: r

我的问题是我一直收到以下错误:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
no lines available in input

自:

setwd("C:/")

lf = list.files(pattern=".csv") 

treat_file = function(f){
  ff = sub("\\.[[:alnum:]]*$", "", f)
  d = read.csv(f, skip=2, sep=",")
  Var2 = sum(d[,3]*d[,5])
  Var3 = 10000*(1/(sum((d[,1]*d[,2])^2)))
  c(as.numeric(ff), Var2, Var3)
} 

newdata = sapply(lf, treat_file)

.csv文件如下所示:

Final Score: 570

Final X, Starting X, Score, Velocity, Success
-6,-210,100,3,1
-19,-279,70,4,0
2,-229,90,3,1
0,-210,100,3,1
19,-329,50,4,0
17,-279,70,4,0
etc,
etc,
etc,

最终代码

事实证明,有一个文件是空的,这使得公司陷入困境。将消息包含在函数中并使用lapply向我展示了事情发生了什么,现在一切正常。

setwd("C:/")

# find all the text files
lf = list.files(pattern=".csv") 
#Make sure they're there
View(lf)

# this function works on a single file
treat_file = function(f){
#this will record where r is processing in case there is an error (blank .csv file)
message("currently reading:", f)
#create column with .csv scrubbed
ff = sub("\\.[[:alnum:]]*$", "", f)
#read in .csv files
d = read.csv(f, skip=2, sep=",")
#create a score variable
Var2 = sum(d[,3]*d[,5])
#create a continuous score variable
Var3 = 10000000*(1/(sum(sqrt((d[,1]*d[,3])^2))))
#combine the three variables
c(as.numeric(ff), Var2, Var3)
} 
#This is a second way of checking how importing the .csv files is going
#shows number of rows and how many columns are in that row
lapply(lf, count.fields, sep=",")

#creates data.frame in which the function is applied to all csv files
#transposes data.frame
newdata = t(sapply(lf, treat_file))
#change column names
colnames(newdata)= c("PIN", "score", "continuous")
#Make sure everything looks good
View(newdata)

原帖

Grad学生在这里试图让我的生活变得更轻松,但在R中编程不是我的专长。非常感谢一些帮助。 所以我进行了一个实验,我得到每个主题的.csv输出,标题为1254.csv,其中四位数字对每个人都是唯一的。我的目标是以data.frame结束,其中第一个变量是每个人的唯一主题编号,第二个和第三个变量是从每个.csv文件计算的数字。我想我应该可以做类似的事情:

object (or environment) = all .csv files #need help figuring out exactly how I get it into a workable object or what-have-you
for(i in 1:ncol (csvfileobject)) { 
Var1$newdata.frame = nameof i 
Var2$newdata.frame = (sum up the numbers in column2 for each csvfile) 
Var3$newdata.frame = (multiply columns 2 and 5 and sum that up for each csvfile) }

显然,我并不是在寻找能为我做所有“工作”的人,但我对R的编程方面很遗憾,并且可能会使用某种方向。 谢谢!

1 个答案:

答案 0 :(得分:2)

我在My goal is to end up with an excel sheet之后几乎停止了阅读,但无论如何,这是一个草图:

# find all the text files
lf = list.files(pattern=".txt") 


# this function works on a single file
treat_file = function(f, ...){

  # magic to strip the filename extension
  ff = sub("\\.[[:alnum:]]*$", "", f)
  # read the data into a data.frame
  d = read.table(f, ...)
  # calculate some stuff with the data
  Var2 = sum(d[ ,2]) # summing all the second column
  Var3 = sum(d[ ,2]*d[ ,5]) # etc.

  # results to be returned
  c(as.numeric(ff), Var2, Var3)

} 

# now we apply the function to all files
sapply(lf, treat_file)