我在numpy中安排数据时遇到问题 示例a包含数据范围列表:
numpy.array([1,3,5,4,6])
我有数据:
numpy.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
我需要将数据安排到
numpy.array([
[1,9999,9999,9999,9999,9999,9999]
[2,3,4,9999,9999,9999]
[5,6,7,8,9,9999]
[10,11,12,13,9999,9999]
[14,15,16,17,18,19]
])
我认为它与诊断/对角线/跟踪功能有点相似。
我通常使用基本的迭代来完成这项工作... numpy有这个功能所以它可以更快地执行吗?
答案 0 :(得分:3)
以下是一些排列数据的方法:
from numpy import arange, array, ones, r_, zeros
from numpy.random import randint
def gen_tst(m, n):
a= randint(1, n, m)
b, c= arange(a.sum()), ones((m, n), dtype= int)* 999
return a, b, c
def basic_1(a, b, c):
# some assumed basic iteration based
n= 0
for k in xrange(len(a)):
m= a[k]
c[k, :m], n= b[n: n+ m], n+ m
def advanced_1(a, b, c):
# based on Svens answer
cum_a= r_[0, a.cumsum()]
i= arange(len(a)).repeat(a)
j= arange(cum_a[-1])- cum_a[:-1].repeat(a)
c[i, j]= b
def advanced_2(a, b, c):
# other loopless version
c[arange(c.shape[1])+ zeros((len(a), 1), dtype= int)< a[:, None]]= b
还有一些时间:
In []: m, n= 10, 100
In []: a, b, c= gen_tst(m, n)
In []: 1.* a.sum()/ (m* n)
Out[]: 0.531
In []: %timeit advanced_1(a, b, c)
10000 loops, best of 3: 99.2 us per loop
In []: %timeit advanced_2(a, b, c)
10000 loops, best of 3: 68 us per loop
In []: %timeit basic_1(a, b, c)
10000 loops, best of 3: 47.1 us per loop
In []: m, n= 50, 500
In []: a, b, c= gen_tst(m, n)
In []: 1.* a.sum()/ (m* n)
Out[]: 0.455
In []: %timeit advanced_1(a, b, c)
1000 loops, best of 3: 1.03 ms per loop
In []: %timeit advanced_2(a, b, c)
1000 loops, best of 3: 1.06 ms per loop
In []: %timeit basic_1(a, b, c)
1000 loops, best of 3: 227 us per loop
In []: m, n= 250, 2500
In []: a, b, c= gen_tst(m, n)
In []: 1.* a.sum()/ (m* n)
Out[]: 0.486
In []: %timeit advanced_1(a, b, c)
10 loops, best of 3: 30.4 ms per loop
In []: %timeit advanced_2(a, b, c)
10 loops, best of 3: 32.4 ms per loop
In []: %timeit basic_1(a, b, c)
1000 loops, best of 3: 2 ms per loop
所以基本的迭代似乎非常有效。
<强>更新强>:
当然,基于迭代的基本实现的性能仍然可以进一步提高。作为一个起点建议;例如考虑这个(基于减少加法的基本迭代):
def basic_2(a, b, c):
n= 0
for k, m in enumerate(a):
nm= n+ m
c[k, :m], n= b[n: nm], nm
答案 1 :(得分:1)
以下是如何在没有使用高级索引的任何Python循环的情况下执行此操作:
r = numpy.array([1,3,5,4,6])
data = numpy.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
result = numpy.empty((len(r), r.max()), data.dtype)
result.fill(9999)
cum_r = numpy.r_[0, r.cumsum()]
i = numpy.arange(len(r)).repeat(r)
j = numpy.arange(cum_r[-1]) - cum_r[:-1].repeat(r)
result[i, j] = data
print result
打印
[[ 1 9999 9999 9999 9999 9999]
[ 2 3 4 9999 9999 9999]
[ 5 6 7 8 9 9999]
[ 10 11 12 13 9999 9999]
[ 14 15 16 17 18 19]]
答案 2 :(得分:0)
再一次,斯文击败了我们所有人:)我的谦卑尝试随之而来,
from numpy import arange,array,split
from numpy import concatenate as cat
from numpy import repeat as rep
a = arange(1,20)
i = array([1,3,5,4,6])
j = max(i) - i
s = split(a,i.cumsum())
z = array([cat((t,rep(9999,k))) for t,k in zip(s[:-1],j)])
print z
提供,
[[ 1 9999 9999 9999 9999 9999]
[ 2 3 4 9999 9999 9999]
[ 5 6 7 8 9 9999]
[ 10 11 12 13 9999 9999]
[ 14 15 16 17 18 19]]