我有一个n×3的索引数组(认为是三角形的索引点)和一个与三角形相关的浮点值的列表。我现在想为每个索引(“点”)获取 minimum 值,即检查包含索引的所有行,例如0,并从let url = Bundle.main.url(forResource:"myFile", withExtension:"mp4")
let data = Data(contentsOf: url)
中获取最小值相应的行:
vals
此解决方案效率不高,因为它对每个import numpy
a = numpy.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3],
])
vals = numpy.array([0.1, 0.5, 0.3, 0.6])
out = [
numpy.min(vals[numpy.any(a == i, axis=1)])
for i in range(6)
]
# out = numpy.array([0.1, 0.1, 0.1, 0.5, 0.3, 0.6])
都进行了完整的数组比较。
此问题与numpy的ufuncs非常相似,但是i
不存在。
有任何提示吗?
答案 0 :(得分:1)
如果您的for循环超出pd.GroupBy
,则可以切换到itertools.groupby
或6
。
例如,
r = n.ravel()
pd.Series(np.arange(len(r))//3).groupby(r).apply(lambda s: vals[s].min())
对于长循环,此解决方案会更快,而对于小循环(<50),此解决方案可能会更慢
答案 1 :(得分:1)
方法1
一种基于数组分配的方法来设置一个2D
填充NaNs
的数组,使用这些a
值作为列索引(因此假设它们是整数),然后映射{ {1}}并在其中寻找最终输出的nan跳过的最小值-
vals
方法2
另一个基于数组分配,但是使用nr,nc = len(a),a.max()+1
m = np.full((nr,nc),np.nan)
m[np.arange(nr)[:,None],a] = vals[:,None]
out = np.nanmin(m,axis=0)
和masking
来支持处理np.minimum.reduceat
-
NaNs
方法3
另一个基于nr,nc = len(a),a.max()+1
m = np.zeros((nc,nr),dtype=bool)
m[a.T,np.arange(nr)] = 1
c = m.sum(1)
shift_idx = np.r_[0,c[:-1].cumsum()]
out = np.minimum.reduceat(np.broadcast_to(vals,m.shape)[m],shift_idx)
(假设您在argsort
中拥有从0
到a.max()
的所有整数)-
a
方法4
为了提高内存效率,从而提高性能。并完成 set -
sidx = a.ravel().argsort()
c = np.bincount(a.ravel())
out = np.minimum.reduceat(vals[sidx//a.shape[1]],np.r_[0,c[:-1].cumsum()])
答案 2 :(得分:0)
numpy.unique
和参数return_index=True
给出了异常中首次出现的(最小)唯一元素的索引:
import numpy as np
a = np.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3]])
u, index = np.unique(a, return_index = True)
# index = [ 0 1 2 4 7 10]
由于数组a
具有3列,index//3
给出了唯一元素的行索引,因此:
import numpy as np
a = np.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3]])
vals = np.array([0.1, 0.5, 0.3, 0.6])
u, index = np.unique(a, return_index = True)
out = vals[index//3]
# [0.1 0.1 0.1 0.5 0.3 0.6]
答案 3 :(得分:0)
以下是基于this Q&A的一个:
如果您有pythran,请编译
文件library(DiagrammeR)
grViz("
digraph twopi {
# node definitions with substituted label text
node [fontname = Helvetica]
a [label = 'Motivation: 5W1H', style=filled, color=CornflowerBlue]
b [label = 'Where', style=filled, color=LimeGreen]
c [label = 'What', style=filled, color=LimeGreen]
d [label = 'How', style=filled, color=LimeGreen]
b1 [label = 'Who', style=filled, color=LimeGreen]
c1 [label = 'When', style=filled, color=LimeGreen]
d1 [label = 'Why', style=filled, color=LimeGreen]
e [label = 'Many words here 1', style=filled, color=LightCoral, fontsize = 7]
f [label = 'Many words here 2 and sth more', style=filled, color=LightCoral, fontsize = 7]
g [label = 'Many words here 3', style=filled, color=LightCoral, fontsize = 7]
h [label = 'Many words here 1s but add more here', style=filled, color=LightCoral, fontsize = 7]
k [label = 'Many words', style=filled, color=LightCoral, fontsize = 7]
l [label = 'Many words', style=filled, color=LightCoral, fontsize = 7]
e1 [label = 'Many words here 1 and', style=filled, color=LightCoral, fontsize = 7]
f1 [label = 'Many words here 1 plus big', style=filled, color=LightCoral, fontsize = 7]
g1 [label = 'Many words here 1', style=filled, color=LightCoral, fontsize = 7]
h1 [label = 'Additional word', style=filled, color=LightCoral, fontsize = 7]
q [label = 'More information here', style=filled, color=LightCoral, fontsize = 7]
r [label = 'Additional information he', style=filled, color=LightCoral, fontsize = 7]
k1 [label = 'Word only', style=filled, color=LightCoral, fontsize = 7]
q1 [label = 'Add text here', style=filled, color=LightCoral, fontsize = 7]
r1 [label = 'Need more space for this', style=filled, color=LightCoral, fontsize = 7]
s1 [label = 'Many words here 1', style=filled, color=LightCoral, fontsize = 7]
t1 [label = 'Many words here 1 and again', style=filled, color=LightCoral, fontsize = 7]
u1 [label = 'Text problem here', style=filled, color=LightCoral, fontsize = 7]
v1 [label = 'Go to the other', style=filled, color=LightCoral, fontsize = 7]
v2 [label = 'Against all text', style=filled, color=LightCoral, fontsize = 7]
graph [layout = neato]
node [shape = circle,
style = filled,
color = grey,
label = '']
node [fillcolor = CornflowerBlue]
a
node [fillcolor = LimeGreen]
b c d b1 c1 d1
node [fillcolor = LightCoral]
edge [color = grey]
a -> {b c d b1 c1 d1}
b -> {e f g h}
c -> {k l}
d -> {q r}
b1 -> {e1 f1 g1 h1}
c1 -> {k1}
d1 -> {q1 r1 s1 t1 u1 v1 v2}
}")
<stb_pthr.py>
否则,脚本将退回到基于稀疏矩阵的方法,该方法只会稍微慢一些:
import numpy as np
#pythran export sort_to_bins(int[:], int)
def sort_to_bins(idx, mx):
if mx==-1:
mx = idx.max() + 1
cnts = np.zeros(mx + 2, int)
for i in range(idx.size):
cnts[idx[i]+2] += 1
for i in range(2, cnts.size):
cnts[i] += cnts[i-1]
res = np.empty_like(idx)
for i in range(idx.size):
res[cnts[idx[i]+1]] = i
cnts[idx[i]+1] += 1
return res, cnts[:-1]
运行示例(时间包括@Divakar方法3供参考):
import numpy as np
try:
from stb_pthr import sort_to_bins
HAVE_PYTHRAN = True
except:
HAVE_PYTHRAN = False
from scipy.sparse import csr_matrix
def sort_to_bins_sparse(idx, mx):
if mx==-1:
mx = idx.max() + 1
aux = csr_matrix((np.ones_like(idx),idx,np.arange(idx.size+1)),
(idx.size,mx)).tocsc()
return aux.indices, aux.indptr
if not HAVE_PYTHRAN:
sort_to_bins = sort_to_bins_sparse
def f_op():
mx = a.max() + 1
return np.fromiter((np.min(vals[np.any(a == i, axis=1)])
for i in range(mx)),vals.dtype,mx)
def f_pp():
idx, bb = sort_to_bins(a.reshape(-1),-1)
res = np.minimum.reduceat(vals[idx//3], bb[:-1])
res[bb[:-1]==bb[1:]] = np.inf
return res
def f_div_3():
sidx = a.ravel().argsort()
c = np.bincount(a.ravel())
bb = np.r_[0,c.cumsum()]
res = np.minimum.reduceat(vals[sidx//a.shape[1]],bb[:-1])
res[bb[:-1]==bb[1:]] = np.inf
return res
a = np.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3],
])
vals = np.array([0.1, 0.5, 0.3, 0.6])
assert np.all(f_op()==f_pp())
from timeit import timeit
a = np.random.randint(0,1000,(10000,3))
vals = np.random.random(10000)
assert len(np.unique(a))==1000
assert np.all(f_op()==f_pp())
print("1000/1000 labels, 10000 rows")
print("op ", timeit(f_op, number=10)*100, 'ms')
print("pp ", timeit(f_pp, number=100)*10, 'ms')
print("div", timeit(f_div_3, number=100)*10, 'ms')
a = 1 + 2 * np.random.randint(0,5000,(1000000,3))
vals = np.random.random(1000000)
nl = len(np.unique(a))
assert np.all(f_div_3()==f_pp())
print(f"{nl}/{a.max()+1} labels, 1000000 rows")
print("pp ", timeit(f_pp, number=10)*100, 'ms')
print("div", timeit(f_div_3, number=10)*100, 'ms')
a = 1 + 2 * np.random.randint(0,100000,(1000000,3))
vals = np.random.random(1000000)
nl = len(np.unique(a))
assert np.all(f_div_3()==f_pp())
print(f"{nl}/{a.max()+1} labels, 1000000 rows")
print("pp ", timeit(f_pp, number=10)*100, 'ms')
print("div", timeit(f_div_3, number=10)*100, 'ms')
更新:@Divakar的最新方法(方法4)很难被击败,它本质上是C实现。这没什么不对的,除了在这里jitting不是一个选项而是一个要求(运行非jitted的代码很无聊)。如果一个人接受了,那么当然可以用pythran来完成:
1000/1000 labels, 10000 rows
op 145.1122640981339 ms
pp 0.7944229000713676 ms
div 2.2905819199513644 ms
5000/10000 labels, 1000000 rows
pp 113.86540920939296 ms
div 417.2476712032221 ms
100000/200000 labels, 1000000 rows
pp 158.23634970001876 ms
div 486.13436080049723 ms
文件pythran -O3 labeled_min.py
<labeled_min.py>
两者都带来了另一个巨大的提速:
import numpy as np
#pythran export labeled_min(int[:,:], float[:])
def labeled_min(A, vals):
mn = np.empty(A.max()+1)
mn[:] = np.inf
M,N = A.shape
for i in range(M):
v = vals[i]
for j in range(N):
c = A[i,j]
if v < mn[c]:
mn[c] = v
return mn
样品运行:
from labeled_min import labeled_min
func1() # do not measure jitting time
print("nmb ", timeit(func1, number=100)*10, 'ms')
print("pthr", timeit(lambda:labeled_min(a,vals), number=100)*10, 'ms')
nmb 8.41792532010004 ms
pthr 8.104007659712806 ms
的速度快了百分之几,但这只是因为我将pythran
的查询移出了内循环;没有它们,它们几乎是平等的。
为进行比较,在有相同问题的情况下,使用和不使用非python帮助程序的情况以前最好:
vals
答案 4 :(得分:0)
显然,numpy.minimum.at
存在:
import numpy
a = numpy.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3],
])
vals = numpy.array([0.1, 0.5, 0.3, 0.6])
out = numpy.full(6, numpy.inf)
numpy.minimum.at(out, a.reshape(-1), numpy.repeat(vals, 3))