我有一个大小(4,X,Y)的numpy数组,其中第一个维度代表(R,G,B,A)四元组。
我的目的是将每个X*Y
RGBA四元组转换为X*Y
浮点值,给定与它们匹配的字典。
我目前的代码如下:
codeTable = {
(255, 255, 255, 127): 5.5,
(128, 128, 128, 255): 6.5,
(0 , 0 , 0 , 0 ): 7.5,
}
for i in range(0, rows):
for j in range(0, cols):
new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999)
其中data
是一个大小为(4, rows, cols)
的numpy数组,而new_data
的大小为(rows, cols)
。
代码工作正常,但需要很长时间。我该如何优化这段代码?
以下是一个完整的例子:
import numpy
codeTable = {
(253, 254, 255, 127): 5.5,
(128, 129, 130, 255): 6.5,
(0 , 0 , 0 , 0 ): 7.5,
}
# test data
rows = 2
cols = 2
data = numpy.array([
[[253, 0], [128, 0], [128, 0]],
[[254, 0], [129, 144], [129, 0]],
[[255, 0], [130, 243], [130, 5]],
[[127, 0], [255, 120], [255, 5]],
])
new_data = numpy.zeros((rows,cols), numpy.float32)
for i in range(0, rows):
for j in range(0, cols):
new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999)
# expected result for `new_data`:
# array([[ 5.50000000e+00, 7.50000000e+00],
# [ 6.50000000e+00, -9.99900000e+03],
# [ 6.50000000e+00, -9.99900000e+03], dtype=float32)
答案 0 :(得分:1)
这是一种返回预期结果的方法,但是如果数据量很少,很难知道这对您来说会更快。因为我已经避免了双循环,但我想你会看到相当不错的加速。
import numpy
import pandas as pd
codeTable = {
(253, 254, 255, 127): 5.5,
(128, 129, 130, 255): 6.5,
(0 , 0 , 0 , 0 ): 7.5,
}
# test data
rows = 3
cols = 2
data = numpy.array([
[[253, 0], [128, 0], [128, 0]],
[[254, 0], [129, 144], [129, 0]],
[[255, 0], [130, 243], [130, 5]],
[[127, 0], [255, 120], [255, 5]],
])
new_data = numpy.zeros((rows,cols), numpy.float32)
for i in range(0, rows):
for j in range(0, cols):
new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999)
def create_output(data):
# Reshape your two data sources to be a bit more sane
reshaped_data = data.reshape((4, -1))
df = pd.DataFrame(reshaped_data).T
reshaped_codeTable = []
for key in codeTable.keys():
reshaped = list(key) + [codeTable[key]]
reshaped_codeTable.append(reshaped)
ct = pd.DataFrame(reshaped_codeTable)
# Merge on the data, replace missing merges with -9999
result = df.merge(ct, how='left')
newest_data = result[4].fillna(-9999)
# Reshape
output = newest_data.reshape(rows, cols)
return output
output = create_output(data)
print(output)
# array([[ 5.50000000e+00, 7.50000000e+00],
# [ 6.50000000e+00, -9.99900000e+03],
# [ 6.50000000e+00, -9.99900000e+03])
print(numpy.array_equal(new_data, output))
# True
答案 1 :(得分:1)
numpy_indexed包(免责声明:我是它的作者)包含一个vector.ndex的矢量化nd数组变体,它可以用来有效和简洁地解决你的问题:
import numpy_indexed as npi
map_keys = np.array(list(codeTable.keys()))
map_values = np.array(list(codeTable.values()))
indices = npi.indices(map_keys, data.reshape(4, -1).T, missing='mask')
remapped = np.where(indices.mask, -9999, map_values[indices.data]).reshape(data.shape[1:])