Question

我对编码很新，但我正在使用R作为论文项目。我有一个netcdf文件，其中包含涵盖整个哈萨克斯坦的日常温度数据，经度和纬度分别为0.75度，从1979年到2016年.3维度是时间（大小13696），纬度（大小21）和经度（大小） 61）。自1900年以来，时间以秒计算。

有没有办法制作一个新的月平均数组？

我能弄明白的唯一方法是采用非常低效的方式，现在它已停止工作了。我的代码如下：

mon.av <- array(dim = c(61,21,444))

for(years in 1:37) {
if(years == 1) {
for(x in 1:61){
  for(y in 1:21){
    mon.av[x, y, 1+(12*(years-1))] <- mean(m2tmp[x, y, 1+(years - 1)* 365:31+(years-1)*365])
    mon.av[x, y, 2+(12*(years-1))] <- mean(m2tmp[x, y, 32+(years - 1)* 365:59+(years-1)*365])
    mon.av[x, y, 3+(12*(years-1))] <- mean(m2tmp[x, y, 60+(years - 1)* 365:90+(years-1)*365])
    mon.av[x, y, 4+(12*(years-1))] <- mean(m2tmp[x, y, 91+(years - 1)* 365:120+(years-1)*365])
    mon.av[x, y, 5+(12*(years-1))] <- mean(m2tmp[x, y, 121+(years - 1)* 365:151+(years-1)*365])
    mon.av[x, y, 6+(12*(years-1))] <- mean(m2tmp[x, y, 152+(years - 1)* 365:181+(years-1)*365])
    mon.av[x, y, 7+(12*(years-1))] <- mean(m2tmp[x, y, 182+(years - 1)* 365:212+(years-1)*365])
    mon.av[x, y, 8+(12*(years-1))] <- mean(m2tmp[x, y, 213+(years - 1)* 365:243+(years-1)*365])
    mon.av[x, y, 9+(12*(years-1))] <- mean(m2tmp[x, y, 244+(years - 1)* 365:273+(years-1)*365])
    mon.av[x, y, 10+(12*(years-1))] <- mean(m2tmp[x, y, 274+(years - 1)* 365:304+(years-1)*365])
    mon.av[x, y, 11+(12*(years-1))] <- mean(m2tmp[x, y, 305+(years - 1)* 365:334+(years-1)*365])
    mon.av[x, y, 12+(12*(years-1))] <- mean(m2tmp[x, y, 335+(years - 1)* 365:365+(years-1)*365])
  }
}
  }
}

我必须将其复制出来并将if(years == 1)更改为年份编号，并且还必须更改闰年！
这似乎工作正常，直到第20年出现错误消息：

m2tmp中的错误[x，y，1 +（年 - 1）* 365：31 +（年 - 1）* 365]：下标超出范围

所以我想知道是否有更简单的方法来获得这些数据的月平均值，或者如果没有，我的代码中的错误是什么呢？

非常感谢任何帮助！

Answer 1

为了提高R的效率，我不会做一般的循环，而是创建一个包含列LAT，LON，TIME和TEMP值的数据帧。据我所知，有两个步骤：

1）将自1900年以来的秒数转换为年/月

2）计算每个地点的月平均值（即纬度/经度）

我在这里创建一个虚拟数据集，在哈萨克斯坦内部有一些随机坐标，随机时间从1900年开始以秒和一些随机温度。随机时间，lat，lon：

time = c(1, 13696, 1)
lat = rnorm(21, mean=46.77, sd=0.1)
lon = rnorm(61, mean=49.77, sd=0.1)

创建data.frame

climatedata = expand.grid(time=time, lat=lat, lon=lon)
> head(climatedata)
   time      lat     lon
1     1 46.85790 49.6907
2 13696 46.85790 49.6907
3     1 46.85790 49.6907
4     1 46.76574 49.6907
5 13696 46.76574 49.6907
6     1 46.76574 49.6907

创建随机温度数据

climatedata$temp = rnorm(dim(climatedata)[1], mean=20, sd=5)

将时间从自1900年以来的秒转换为年 - 月（*我不完全确定这是否正确 - 更好地检查此转换：http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ColeBeck/datestimes.pdf）

climatedata$date <- format(as.Date(as.POSIXct(climatedata$time, origin="1900-01-01")), "%Y-%m")

> head(climatedata)
   time      lat     lon     temp    date
1     1 46.85790 49.6907 22.25540 1900-01
2 13696 46.85790 49.6907 15.39590 1900-01
3     1 46.85790 49.6907 19.23888 1900-01
4     1 46.76574 49.6907 16.94528 1900-01
5 13696 46.76574 49.6907 11.92085 1900-01
6     1 46.76574 49.6907 22.44737 1900-01

然后使用plyr包计算每个日期（年 - 月）和位置（纬度/经度）组合的平均温度，如此

avClimateData = ddply(climatedata, .(date, lon, lat), 
summarise, monthlyAv = mean(temp))

Answer 2

我认为在调用R：

之前使用CDO在bash中更容易做到

$ cdo monmean input.nc output.nc

您可以在此处找到文档：https://code.zmaw.de/projects/cdo/wiki/Cdo#Documentation

如果您没有安装它，请在Ubuntu上：

sudo apt-get install cdo

从R中获取netcdf文件的月平均值的有效方法

2 个答案: