Question

我试图从NCDef（.nc）格式的几个文件中提取数据。我写的代码有效，但很慢，我很感激任何建议！

我的工作代码使用RNetCDF提取一个名为temp的文件，并将其转换为＆＃34;长列表＆＃34;其中每个变量都有一个或三个维度（lat，lon和pft）。然后，我逐个从每个变量中提取数据（varlist[j]）并将其转换为数据帧。然后通过三个维度中的每个维度将其分解出来。最后一步，创建files，允许我使用cbind和rbind将所有文件放在一个大型数据帧中。

代码如下：

setwd("C:/Users/User/Box Sync/_PhD/PhD_Research/Albedo/Data_CLM/PFTRuns/2005/")
fname<-"b40.20th.1deg.bdrd.002bc.clm2.h0.2005-"
numlist<-c('01','02','03','04','05','06','07','08','09','10','11','12')
varlist<-c(1,2,4,8,9,21)
varname<-c("lon","lat","pft","pft_wtgcell","pft_wtcol","FSR")
files<-matrix(data=NA, nrow=12, ncol=length(varlist))

`for (i in 12:12) {
  temp<- paste(c(fname, numlist[i],'.nc'), collapse='')
  temp<-read.nc(open.nc(temp))
  temp<-structure(temp, row.names = c(NA, -288), class = "data.frame")
  for (j in 3:length(varlist)) {
    newname<-paste(c("Y2005", numlist[i],".", varname[j]), collapse='')
    if (j<4){
        assign(newname, adply(temp[,varlist[j]], c(1)))}
    else{
        assign(newname, adply(temp[,varlist[j]], c(1,2,3)))}
    files[i,j]<-newname}}`

修改的以下是read.nc（open.nc（））输出的示例。 #205

Answer 1

调查你的问题。你有一系列的3d数组，你想要平坦化为1d数组，并与坐标向量（lon，lat和put）对齐。
虽然包plyr具有许多非常有用的功能，但它们往往很慢。与上面的情况一样。

这是我为测试而创建的示例数据：

#create some test data
set.seed(1)
lon<-1:300
lat<-1:200
pft<-1:15
tot<-length(lon)*length(lat)*length(pft)
pft_wtgcell<- array(rnorm(tot, 10), dim=c(length(lon),length(lat),length(pft)))
pft_wtcol<- array(rnorm(tot, 60, 2), dim=c(length(lon),length(lat),length(pft)))
FSR<- array(rnorm(tot, 100, 3), dim=c(length(lon),length(lat),length(pft)))
temp<-list(lon=lon, lat=lat, pft=pft, pft_wtgcell=pft_wtgcell, pft_wtcol=pft_wtcol, FSR=FSR)

这是我的解决方案：

numlist<-c('01','02','03','04','05','06','07','08','09','10','11','12')
#varlist<-c(1,2,4,8,9,21)
varname<-c("lon","lat","pft","pft_wtgcell","pft_wtcol","FSR")
files<-matrix(data=NA, nrow=12, ncol=length(varname))
#loop to cycle through the file starts here:
i<-1

#crate data.frame for lon, lat and pft
newname<-paste(c("Y2005", numlist[i],".", varname[1]), collapse='')
coord<-expand.grid(temp$lon, temp$lat, temp$pft)
assign(newname, coord)
files[i,1]<-newname
#loop through the variables of interest
#  could probly be simplified.
for (j in 4:length(varname)) {
  newname<-paste(c("Y2005", numlist[i],".", varname[j]), collapse='')
  assign(newname, as.data.frame.table(temp[[varname[j] ]])$Freq)
  files[i,j]<-newname
}

我避免将样本数据转换为数据框，并决定直接在列表上工作。 expand.grid函数可以快速创建lon，lat和pft的所有可能组合的数据框。搜索有关如何展平3d数组的提示，我发现as.data.frame.table函数的引用在这种情况下有用，我也只是存储最后一个（展平的）数据列。只需将所需数据存储在data.frame中，rbinds也应该执行得更快。

我没有对错误进行过广泛的检查，但在我的笔记本电脑上，我发现上述测试用例的速度提高了500倍。

如果这对您有用，请接受答案，否则我可以再调整一下。

Answer 2

在@ Dave2e

的帮助下，我找到了答案

#Library Settings #------------------------------------------------------------------------------------
library(RNetCDF);library(data.table);library(plyr); library(arrayhelpers)
#File Settings #---------------------------------------------------------------------------------------
setwd("C:/Users/User/Box Sync/_PhD/PhD_Research/Albedo/Data_CLM/PFTRuns/2005/")
fname<-"b40.20th.1deg.bdrd.002bc.clm2.h0.2005-"
numlist<-c('01','02','03','04','05','06','07','08','09','10','11','12')
varlist<-c(1,2,4,8,9,21)
varname<-c("lon","lat","pft","pft_wtgcell","pft_wtcol","FSR")
files<-matrix(data=NA,ncol=12)

for (i in 1:12) {
  temp<- paste(c(fname, numlist[i],'.nc'), collapse='')
  temp<-read.nc(open.nc(temp))
  for (j in 1:3){
    vars<-temp[[varlist[j]]]
    newname<-paste(c("Y2005",".", varname[j]), collapse='')
    assign(newname, vars)}
  coord<-expand.grid(Y2005.lon, Y2005.lat, Y2005.pft)
  coord$month<-c(numlist[i])
  newname<-paste(c("Y2005", numlist[i]), collapse=''); assign(newname, coord)
  for (j in 4:length(varlist)){
    temp2<-array2df(temp[[varlist[j]]], label.x=varname[j])
    assign(newname, cbind(temp2, get(newname)))}
  files[i]<-newname}
Y2005<-lapply(files, cbind)

在R中缓慢循环

2 个答案: