Question

我需要从grib2文件（我放入光栅堆栈）中有效地提取温度数据。堆栈中的每个栅格图层代表一个时间点。

现在我需要为每个观察（x，y，t）提取一个单独的值。以下代码完成了这项工作，但需要花费太多时间。任何提高效率的建议都非常受欢迎。

files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE)

s <- stack(files)

userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)

smalldata <- userdata[1:100,]
tic()
smalldata$temp1morning <- getValues(s[[smalldata$t]], smalldata$y)[smalldata$x]
toc()

编辑：loki的回答非常有用。然而，当我将这种方法用于我的温度数据时，它确实很慢。我怀疑这是由我的温度数据结构引起的，有很多时间段。请参阅下文，了解建议的方法与使用getValues的类似尝试之间的比较。任何想法为什么会这样或我如何改进代码？

> files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE, pattern = glob2rx("*06.f003.grib*"))
> 
> s <- stack(files)
> s
class       : RasterStack 
dimensions  : 197, 821, 161737, 971  (nrow, ncol, ncell, nlayers)
resolution  : 0.25, 0.25  (x, y)
extent      : 190.875, 396.125, 22.875, 72.125  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371229 +b=6371229 +no_defs 
names       : gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, ... 

> 
> userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
> userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)
> 
> smalldata <- data.frame(x = userdata$x[1:2],
+ y = userdata$y[1:2],
+ t = userdata$t[1:2])
> 
> smalldata
    x  y   t
1 142 67 547
2 779 14 829
> 
> tic("apply")
> smalldata$temp1morning <- apply(smalldata, 1, function(x){s[x[2], x[1]][x[3]]})
> toc()
apply: 305.41 sec elapsed
> 
> tic("getValues")
> smalldata$temp2morning <- apply(smalldata, 1, function(x){getValues(s[[x[3]]], x[2])[x[1]]})
> toc()
getValues: 0.33 sec elapsed
> 
> smalldata
    x  y   t temp1morning temp2morning
1 142 67 547    13.650018    13.650018
2 779 14 829    -1.750006    -1.750006
>

Answer 1

让我们从一个可重复的例子开始：

library(raster)
r <- raster(ncol = 100, nrow = 100)
r[] <- runif(ncell(r))

s <- stack(r, r, r)
s

现在，我们假设您的userdata拥有以下结构：

x是像素行的索引
y是像素列的索引
t是图层的索引，我们正在寻找

让我们创建一个可重现的用户数据：

userdata <- data.frame(x = sample(1:100, 10), 
                       y = sample(1:100, 10), 
                       t = sample(1:3, 10, replace = T))

然后我们可以使用apply来处理userdata中的所有行，并使用row，col和layer的索引来提取值：

userdata$pixelvalue <- apply(userdata, 1, function(x){s[x[1], x[2]][x[3]]})

在apply的每次迭代中，一个像素由其在所有图层的栅格中的x和y位置选择。 x[3]然后只返回相应图层的值。

这遵循逻辑：

stack[*row*, *column*][*layer*]

您的方法的优点是，您不必将整个栅格转换为矢量（基本上是getValues所做的），而是直接将数据作为{{的矩阵结构来访问1}}。

Answer 2

我找到了一个适合我的简单解决方案。首先，我使用as.array将温度数据输入到数组中。然后，我在loki建议的数组上使用apply：

files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE, pattern = glob2rx("*06.f003.grib*"))

s <- stack(files)
a <- as.array(s)

userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)

smalldata <- data.frame(x = userdata$x[1:nrow(userdata)],
            y = userdata$y[1:nrow(userdata)],
            t = userdata$t[1:nrow(userdata)])

tic("array")
userdata$temp1morning <- apply(smalldata, 1, function(x){a[x[2], x[1], x[3]]})
toc()

这对我的目的而言足够快。 loki，谢谢你的帮助！

从R中的栅格堆栈有效地访问数据

2 个答案: