用于以下R索引平均代码的等效Python

时间:2014-09-22 23:17:10

标签: python r

以下命令用于查找数组中特定x,y,z-indices的平均值。在我试图解决的更大问题中,将2Gb文件读入4D阵列,其中前3个维度与空间(x,y,z)相关,第4个维度是时间。自从编写了这个R脚本以来,我已经使用python来读取包含数据的2Gb文件,并希望将下面的R脚本行转换为Python,所以我可以用一种语言来完成。有没有人知道相应的Python?:

# create a small example dataset for testing out script
test_dat <- array(rnorm(10*10*4*50), dim=c(10,10,4,50))  

# create a list of specific indices I want the average of (arbitrary
# in this case, but not in the larger problem at hand)
xyz_index <- list(c(2,10,1), c(4,5,1), c(6,7,1), c(9,3,1)) 

# bind the index data into a matrix for the next step
m <- do.call(rbind,xyz_index) ## 4x3 matrix 

# will return the average of the values in test_dat that are 
# in the positions listed in xyz_index for each time index 
# (50 values in this small problem)
sapply(seq(dim(test_dat)[4]), function(i) mean(test_dat[cbind(m,i)])) 

1 个答案:

答案 0 :(得分:0)

我认为这是你想要的,请确认。 结果连接多个切片在numpy中非常痛苦,显然也在pandas。你真的不想写出像these stride tricks这样模糊的代码。

import numpy as np

test_dat = np.random.randn(10,10,4,50)

#xyz_index <- list(c(2,10,1), c(4,5,1), c(6,7,1), c(9,3,1))     
#m <- do.call(rbind,xyz_index) ## 4x3 matrix 
# I'm not sure about this, but it seems to get the 4x50 submatrix of concatenated slices
# see https://stackoverflow.com/questions/21349133/numpy-array-integer-indexing-in-more-than-one-dimension
m = np.r_[ '0,2', test_dat[2,9,1,:], test_dat[4,5,1,:], test_dat[6,7,1,:], test_dat[9,3,1,:] ]

# Compute .mean() of values, sweep over t-axis
#sapply(seq(dim(test_dat)[4]), function(i) mean(test_dat[cbind(m,i)]))
m.mean(axis=1)

顺便说一句,我也看了numpy masked array