聚合R中多个输入文件的输出

时间:2012-11-19 22:25:59

标签: r count aggregate

现在我有下面的R代码。它读入的数据如下所示:

track_id    day hour    month   year    rate    gate_id pres_inter  vmax_inter
9   10  0   7   1   9.6451E-06  2   97809   23.545
9   10  0   7   1   9.6451E-06  17  100170  13.843
10  3   6   7   1   9.6451E-06  2   96662   31.568
13  22  12  8   1   9.6451E-06  1   94449   48.466
13  22  12  8   1   9.6451E-06  17  96749   30.55
16  13  0   8   1   9.6451E-06  4   98702   19.205
16  13  0   8   1   9.6451E-06  16  98585   18.143
19  27  6   9   1   9.6451E-06  9   98838   20.053
19  27  6   9   1   9.6451E-06  17  99221   17.677
30  13  12  6   2   9.6451E-06  2   97876   27.687
30  13  12  6   2   9.6451E-06  16  99842   18.163
32  20  18  6   2   9.6451E-06  1   99307   17.527


##################################################################
# Input / Output variables
##################################################################
for (N in (59:96)){
  if (N < 10){
#     TrackID <- "000$N"
     TrackID <- paste("000",N, sep="")
  }
  else{
#     TrackID <- "00$N"
     TrackID <- paste("00",N, sep="")
  }
  print(TrackID)

# For 2010_08_24 trackset
#  fname_in <- paste('input/2010_08_24/intersections_track_calibrated_jma_from1951_',TrackID,'.csv', sep="")
#  fname_out <- paste('output/2010_08_24/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
# For 2012_05_01 trackset
  fname_in <- paste('input/2012_05_01/intersections_track_param_',TrackID,'.csv', sep="")
  fname_out <- paste('output/2012_05_01/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
  fname_out2 <- paste('output/2012_05_01/GateID_',TrackID,'.csv', sep="")

#######################################################################
# we read the gate crossing track date
  cat('reading the crosstat output file', fname_in, '\n')
  header <- read.table(fname_in, nrows=1)
  track <- read.table(fname_in, sep=',', skip=1)
  colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")

#  track_id=track[,1]
#  pres_inter=track[,15]

# Function to select maximum surge by stormID 
  ByTrack <- ddply(track, "ID", function(x) x[which.max(x$vmax_inter),])
  ByGate <- count(track, vars="gate_id")

# Write the output file with a single record per storm                     
  cat('Writing the full output file', fname_out, '\n')
  write.table(ByTrack, fname_out, col.names=T, row.names=F, sep = ',')

# Write the output file with a single record per storm                     
   cat('Writing the full output file', fname_out2, '\n')
   write.table(ByGate, fname_out2, col.names=T, row.names=F, sep = ',')
}

我的代码最后一部分的输出是按GateID分组的文件,并输出出现的频率。它看起来像这样:

gate_id freq
1   935
2   2096
3   1363
4   963
5   167
6   17
7   43
8   62
9   208
10  267
11  64
12  162
13  178
14  632
15  807
16  2003
17  838
18  293

问题是我为96个不同的输入文件输出了一个看起来像这样的文件。我想要计算每个输入文件的这些聚合,然后将所有96个输入的频率相加并打印出一个SINGLE输出文件,而不是输出96个单独的文件。有人可以帮忙吗?

谢谢, ķ

1 个答案:

答案 0 :(得分:1)

您将需要执行以下功能。这将获取一个目录中的所有.csv文件,因此该目录必须只包含您要在其中分析的文件。

myFun <- function(out.file = "mydata") {
files <- list.files(pattern = "\\.(csv|CSV)$")
# Use this next line if you are going use the file name as a variable/output etc
files.noext <- substr(basename(files), 1, nchar(basename(files)) - 4)

for (i in 1:length(files)) {
    temp <- read.csv(files[i], header = FALSE)
    # YOUR CODE HERE
    # Use the code you have already written but operate on files[i] or temp
    # Save the important stuff into one data frame that grows
    # Think carefully ahead of time what structure makes the  most sense
    }

datafile <- paste(out.file, ".csv", sep = "")
write.csv(yourDataFrame, file = datafile)
}