所以我有一个大型3D数据矩阵,例如 10000X10000X1000 ,现在我要做的是遍历3D数据矩阵的每个元素,并将索引和2的值写入文件具有相同大小的不同矩阵,例如一行:
i j k val1 val2
我目前的工作是在3个嵌套循环中运行,并按以下方式打印它,例如2个小型3D数据矩阵的示例和方法:
import numpy as np
vv1= np.array([[[1,2,3],[2,3,4],[3,4,5]],
[[4,5,6],[5,6,7],[6,7,8]],
[[7,8,9],[8,9,10],[9,10,11]]])
vv2= np.array([[[1,2,3],[2,3,4],[3,4,5]],
[[4,5,6],[5,6,7],[6,7,8]],
[[7,8,9],[8,9,10],[9,10,11]]])
for x in range(vv1.shape[0]):
for y in range(vv1.shape[1]):
for z in range(vv1.shape[2]):
print("{:} {:} {:} {:} {:}".format(x,y,z,vv1[x,y,z], vv2[x,y,z]))
这个简单的代码可以完成工作,但是很慢。
我想到的另一种方法是创建一个一维长向量,每个条目将是3个索引值,然后对打印应用相同的逻辑,例如嵌套循环示例:
vv_ind = []
for x in range(vv1.shape[0]):
for y in range(vv1.shape[1]):
for z in range(vv1.shape[2]):
vv_ind.append([x,y,z])
for elem in vv_ind:
i = tuple(elem)
print("{:} {:} {:} {:} {:}".format(*elem, vv1[i], vv2[i]))
给出所需的输出。
我的问题如下:
关于最后的打印循环:
for elem in vv_ind:
i = tuple(elem)
print("{:} {:} {:} {:} {:}".format(*elem, vv1[i], vv2[i]))
有更有效的方法吗?
同样,这里给出的数组只是虚设的
不胜感激
答案 0 :(得分:2)
您可以使用np.mgrid
来生成索引,并且如果您不介意将所有内容保存为相同的数据类型,则可以将数组堆叠在一起并通过np.save
或{{ 1}}:
np.savetxt
否则,您还可以使用np.ndindex
遍历数组索引:
In [1]: import numpy as np
In [2]: a = np.random.randint(0, 255, size=(4, 4, 4))
In [3]: b = np.random.randint(0, 255, size=(4, 4, 4))
In [4]: data = np.stack([x.ravel() for x in np.mgrid[:4, :4, :4]] + [a.ravel(), b.ravel()], axis=1)
In [5]: np.save('/tmp/test.npy', data)
In [6]: data
Out[6]:
array([[ 0, 0, 0, 169, 35],
[ 0, 0, 1, 14, 120],
[ 0, 0, 2, 93, 207],
[ 0, 0, 3, 70, 158],
[ 0, 1, 0, 115, 52],
[ 0, 1, 1, 10, 248],
[ 0, 1, 2, 5, 123],
[ 0, 1, 3, 125, 143],
[ 0, 2, 0, 73, 241],
[ 0, 2, 1, 25, 118],
[ 0, 2, 2, 240, 159],
[ 0, 2, 3, 60, 179],
[ 0, 3, 0, 29, 221],
[ 0, 3, 1, 214, 33],
[ 0, 3, 2, 145, 60],
[ 0, 3, 3, 207, 74],
[ 1, 0, 0, 7, 37],
[ 1, 0, 1, 146, 192],
[ 1, 0, 2, 227, 83],
[ 1, 0, 3, 247, 51],
[ 1, 1, 0, 253, 18],
[ 1, 1, 1, 188, 2],
[ 1, 1, 2, 164, 252],
[ 1, 1, 3, 192, 229],
[ 1, 2, 0, 18, 236],
[ 1, 2, 1, 85, 48],
[ 1, 2, 2, 20, 233],
[ 1, 2, 3, 81, 152],
[ 1, 3, 0, 122, 30],
[ 1, 3, 1, 227, 221],
[ 1, 3, 2, 11, 247],
[ 1, 3, 3, 84, 203],
[ 2, 0, 0, 5, 94],
[ 2, 0, 1, 174, 179],
[ 2, 0, 2, 224, 222],
[ 2, 0, 3, 168, 40],
[ 2, 1, 0, 160, 136],
[ 2, 1, 1, 16, 121],
[ 2, 1, 2, 237, 241],
[ 2, 1, 3, 70, 29],
[ 2, 2, 0, 127, 188],
[ 2, 2, 1, 33, 67],
[ 2, 2, 2, 4, 138],
[ 2, 2, 3, 153, 114],
[ 2, 3, 0, 162, 8],
[ 2, 3, 1, 254, 91],
[ 2, 3, 2, 153, 69],
[ 2, 3, 3, 167, 33],
[ 3, 0, 0, 99, 101],
[ 3, 0, 1, 26, 2],
[ 3, 0, 2, 162, 131],
[ 3, 0, 3, 23, 97],
[ 3, 1, 0, 226, 37],
[ 3, 1, 1, 5, 130],
[ 3, 1, 2, 215, 164],
[ 3, 1, 3, 247, 95],
[ 3, 2, 0, 138, 49],
[ 3, 2, 1, 248, 175],
[ 3, 2, 2, 134, 39],
[ 3, 2, 3, 170, 67],
[ 3, 3, 0, 1, 177],
[ 3, 3, 1, 245, 31],
[ 3, 3, 2, 71, 160],
[ 3, 3, 3, 81, 9]])
答案 1 :(得分:1)
要创建索引列表,可以使用函数product
:
from itertools import product
product(*3 * [range(3)]) # generator of indices
或
product(range(3), range(3), range(3))
或
from itertools import product, repeat
product(*repeat(range(3), 3))
您可以简化代码:
from itertools import product, repeat
for idx in product(*repeat(range(3), 3)):
print(*idx, vv1[idx], vv2[idx])
正如评论中提到的@a_guest一样,我们可以使用np.ndindex(*vv1.shape)
代替product(*repeat(range(3), 3))
:
答案 2 :(得分:1)
如果数据不是整数,则可以使用结构化数组使用np.savetxt
进行操作:
import numpy as np
import io
# Data
vv1 = np.array([[[ 1, 2, 3], [ 2, 3, 4],[ 3, 4, 5]],
[[ 4, 5, 6], [ 5, 6, 7],[ 6, 7, 8]],
[[ 7, 8, 9], [ 8, 9, 10],[ 9, 10, 11]]], np.float32)
vv2 = np.array([[[ 1, 2, 3], [ 2, 3, 4],[ 3, 4, 5]],
[[ 4, 5, 6], [ 5, 6, 7],[ 6, 7, 8]],
[[ 7, 8, 9], [ 8, 9, 10],[ 9, 10, 11]]], np.float32)
xx, yy, zz = np.meshgrid(*map(range, vv1.shape), indexing='ij')
# Structured array of indices and data
a = np.empty(len(idx), dtype='i,i,i,f,f')
a['f0'] = xx.ravel()
a['f1'] = yy.ravel()
a['f2'] = zz.ravel()
a['f3'] = vv1.ravel()
a['f4'] = vv2.ravel()
# Using StringIO here to show result, normally would use a file object or file name
s = io.StringIO()
np.savetxt(s, a, fmt='%d %d %d %.3f %.3f')
print(s.getvalue())
输出:
0 0 0 1.000 1.000
0 0 1 2.000 2.000
0 0 2 3.000 3.000
0 1 0 2.000 2.000
0 1 1 3.000 3.000
0 1 2 4.000 4.000
0 2 0 3.000 3.000
0 2 1 4.000 4.000
0 2 2 5.000 5.000
1 0 0 4.000 4.000
1 0 1 5.000 5.000
1 0 2 6.000 6.000
1 1 0 5.000 5.000
1 1 1 6.000 6.000
1 1 2 7.000 7.000
1 2 0 6.000 6.000
1 2 1 7.000 7.000
1 2 2 8.000 8.000
2 0 0 7.000 7.000
2 0 1 8.000 8.000
2 0 2 9.000 9.000
2 1 0 8.000 8.000
2 1 1 9.000 9.000
2 1 2 10.000 10.000
2 2 0 9.000 9.000
2 2 1 10.000 10.000
2 2 2 11.000 11.000
np.savetxt
实际上只是在内部循环遍历数据,因此,它并不是像魔术般更快。可能不值得为此创建额外的大型数组。