我有一个长度为150的整数数组,整数范围为1到3.例如,
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
我想转换/ map / transform
1 to [0,0,1]
2 to [0,1,0]
3 to [1,0,0]
有没有一种有效的方法呢?
所以输出就像
[0,0,1],[0,0,1],[0,0,1]...[1,0,0]
答案 0 :(得分:7)
首先,将变换编码为数组(因为您没有映射0,所以使用虚拟的第一个元素):
>>> mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])
然后它是微不足道的:
>>> arr = np.array([1,1,2,3,3,3])
>>> mapping[arr]
array([[0, 0, 1],
[0, 0, 1],
[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]])
答案 1 :(得分:6)
您实际上只需比较它们并设置适当的项目:
>>> # a bit shorter so it's easier to demonstrate
>>> arr = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
>>> arr2 = np.zeros([arr.size, 3], arr.dtype)
>>> arr2[:, 0] = arr == 3
>>> arr2[:, 1] = arr == 2
>>> arr2[:, 2] = arr == 1
>>> arr2
array([[0, 0, 1],
[0, 0, 1],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]])
你说你对效率很感兴趣,所以我做了一些时间安排:
my_dict = {
1:[0,0,1],
2:[0,1,0],
3:[1,0,0]
}
mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])
def mine(arr):
arr2 = np.zeros([arr.size, 3], arr.dtype)
arr2[:, 0] = arr == 3
arr2[:, 1] = arr == 2
arr2[:, 2] = arr == 1
return arr2
def JoaoAreias(arr):
return [my_dict[i] for i in arr]
def JohnZwinck(arr):
return mapping[arr]
def Divakar(arr):
return (arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)
def Divakar2(arr):
return np.take(mapping, arr,axis=0)
arr = np.random.randint(1, 4, (150))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr) # 5. - 10000 loops, best of 3: 48.3 µs per loop
%timeit JoaoAreias(arr) # 6. - 10000 loops, best of 3: 179 µs per loop
%timeit JohnZwinck(arr) # 3. - 10000 loops, best of 3: 24.1 µs per loop
%timeit mine_numba(arr) # 1. - 100000 loops, best of 3: 6.02 µs per loop
%timeit Divakar(arr) # 4. - 10000 loops, best of 3: 34.2 µs per loop
%timeit Divakar2(arr) # 2. - 100000 loops, best of 3: 13.5 µs per loop
arr = np.random.randint(1, 4, (10000))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr) # 4. - 1000 loops, best of 3: 201 µs per loop
%timeit JoaoAreias(arr) # 6. - 100 loops, best of 3: 10.2 ms per loop
%timeit JohnZwinck(arr) # 5. - 1000 loops, best of 3: 455 µs per loop
%timeit mine_numba(arr) # 1. - 10000 loops, best of 3: 103 µs per loop
%timeit Divakar(arr) # 3. - 10000 loops, best of 3: 155 µs per loop
%timeit Divakar2(arr) # 2. - 10000 loops, best of 3: 146 µs per loop
所以这取决于你喜欢的datasize,如果它比@JohnZwinck有一个相当小的最快的解决方案,对于“更大”的数据集,我的方法获胜。 :)
实际上,如果您打算使用numba(或替代cython
或类似的),您可以击败所有其他方法:
import numba as nb
@nb.njit
def mine_numba(arr):
arr2 = np.zeros((arr.size, 3), arr.dtype)
for idx in range(arr.size):
item = arr[idx]
if item == 1:
arr2[idx, 2] = 1
elif item == 2:
arr2[idx, 1] = 1
else:
arr2[idx, 0] = 1
return arr2
答案 2 :(得分:3)
怎么样?
a = [1, 1, 1, 2, 2, 2, 3, 3, 3]
b = []
for i in a:
if i == 1:
b.append([0,0,1])
elif i == 2:
b.append([0,1,0])
else:
b.append([1,0,0])
print(b)
#[[0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0]]
答案 3 :(得分:3)
我会使用字典和列表理解来完成它,比如这个
'''
This is a dictionary to map your values
'''
my_dict = {
1:[0,0,1],
2:[0,1,0],
3:[1,0,0]
}
'''
This is your original Array
'''
my_array = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
'''
Use list comprehention to map one to another
'''
my_new_array = [my_dict[i] for i in my_array]
答案 4 :(得分:2)
方法#1:使用NumPy broadcasting
-
(arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)
方法#2:类似于@John Zwinck
的索引理念,但沿着第一个轴有np.take
,这在这里有所帮助,因为索引被大量重复。这些时间安排在this previous post
。
mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])
out = np.take(mapping, arr,axis=0)
使用@ MSeifert基准设置进行运行时测试 -
In [85]: arr = np.random.randint(1, 4, (10000))
In [86]: %timeit MSeifert(arr)
...: %timeit JoaoAreias(arr)
...: %timeit JohnZwinck(arr)
...:
10000 loops, best of 3: 105 µs per loop
100 loops, best of 3: 2.97 ms per loop
1000 loops, best of 3: 240 µs per loop
# Approach #1
In [87]: %timeit (arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)
10000 loops, best of 3: 44.1 µs per loop
# Approach #2
In [88]: %timeit np.take(mapping, arr,axis=0)
10000 loops, best of 3: 73 µs per loop
答案 5 :(得分:1)
使用列表理解的解决方案,如果您的范围是1到3:
>>> [([0,0,1] if x==1 else [0,1,0] if x==2 else [1,0,0]) for x in c]
[[0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0]]
这更加亢进和快速。