Question

以下命令用于查找数组中特定x，y，z-indices的平均值。在我试图解决的更大问题中，将2Gb文件读入4D阵列，其中前3个维度与空间（x，y，z）相关，第4个维度是时间。自从编写了这个R脚本以来，我已经使用python来读取包含数据的2Gb文件，并希望将下面的R脚本行转换为Python，所以我可以用一种语言来完成。有没有人知道相应的Python？：

# create a small example dataset for testing out script
test_dat <- array(rnorm(10*10*4*50), dim=c(10,10,4,50))  

# create a list of specific indices I want the average of (arbitrary
# in this case, but not in the larger problem at hand)
xyz_index <- list(c(2,10,1), c(4,5,1), c(6,7,1), c(9,3,1)) 

# bind the index data into a matrix for the next step
m <- do.call(rbind,xyz_index) ## 4x3 matrix 

# will return the average of the values in test_dat that are 
# in the positions listed in xyz_index for each time index 
# (50 values in this small problem)
sapply(seq(dim(test_dat)[4]), function(i) mean(test_dat[cbind(m,i)]))

Answer 1

我认为这是你想要的，请确认。结果连接多个切片在numpy中非常痛苦，显然也在pandas。你真的不想写出像these stride tricks这样模糊的代码。

import numpy as np

test_dat = np.random.randn(10,10,4,50)

#xyz_index <- list(c(2,10,1), c(4,5,1), c(6,7,1), c(9,3,1))     
#m <- do.call(rbind,xyz_index) ## 4x3 matrix 
# I'm not sure about this, but it seems to get the 4x50 submatrix of concatenated slices
# see https://stackoverflow.com/questions/21349133/numpy-array-integer-indexing-in-more-than-one-dimension
m = np.r_[ '0,2', test_dat[2,9,1,:], test_dat[4,5,1,:], test_dat[6,7,1,:], test_dat[9,3,1,:] ]

# Compute .mean() of values, sweep over t-axis
#sapply(seq(dim(test_dat)[4]), function(i) mean(test_dat[cbind(m,i)]))
m.mean(axis=1)

顺便说一句，我也看了numpy masked array

用于以下R索引平均代码的等效Python

1 个答案: