NumPy版本:1.14.5
“ foo”功能的目的:
问题:
问题:
将数组保留在字典中并对其进行迭代是对NumPy数组的一种非常危险的使用吗?
请求咨询:
当我将数组堆叠并与dict分开时,欧几里得距离计算花费了一半的时间(〜120ms而不是〜250ms),但是由于某些原因,整体性能并没有太大变化。分配新数组并将其堆叠可能抵消了更大数组计算的好处。 我愿意接受任何建议。
代码:
import numpy as np
import time
import uuid
import random
from funcy import print_durations
@print_durations
def foo(merged_faces_rec, face):
t = time.time()
for uid, feature_list in merged_faces_rec.items():
dist = np.linalg.norm( np.subtract(feature_list[0], face))
print("foo inside : ", time.time()-t)
rand_age = lambda : random.choice(["0-18", "18-35", "35-55", "55+"])
rand_gender = lambda : random.choice(["Erkek", "Kadin"])
rand_emo = lambda : random.choice(["happy", "sad", "neutral", "scared"])
date_list = []
emb = lambda : np.random.rand(1, 512)
def generate_faces_rec(d, n=12000):
for _ in range(n):
d[uuid.uuid4().hex] = [emb(), rand_gender(), rand_age(), rand_emo(), date_list]
faces_rec1 = dict()
generate_faces_rec(faces_rec1)
faces_rec2 = dict()
generate_faces_rec(faces_rec2)
faces_rec3 = dict()
generate_faces_rec(faces_rec3)
faces_rec4 = dict()
generate_faces_rec(faces_rec4)
faces_rec5 = dict()
generate_faces_rec(faces_rec5)
merged_faces_rec = dict()
st = time.time()
merged_faces_rec.update(faces_rec1)
merged_faces_rec.update(faces_rec2)
merged_faces_rec.update(faces_rec3)
merged_faces_rec.update(faces_rec4)
merged_faces_rec.update(faces_rec5)
t2 = time.time()
print("updates: ", t2-st)
face = list(merged_faces_rec.values())[0][0]
t3 = time.time()
print("face: ", t3-t2)
t4 = time.time()
foo(merged_faces_rec, face)
t5 = time.time()
print("foo: ", t5-t4)
结果:
t4和t5之间的计算耗时168秒。
updates: 0.00468754768371582
face: 0.0011434555053710938
foo inside : 0.2232837677001953
223.32 ms in foo({'d02d46999aa145be8116..., [[0.96475353 0.8055263...)
foo: 168.42408967018127
cProfile
python3 -m cProfile -s tottime test.py
cProfile结果:
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
30720512 44.991 0.000 85.425 0.000 arrayprint.py:888(__call__)
36791296 42.447 0.000 42.447 0.000 {built-in method numpy.core.multiarray.dragon4_positional}
30840514/60001 36.154 0.000 149.749 0.002 arrayprint.py:659(recurser)
24649728 25.967 0.000 25.967 0.000 {built-in method numpy.core.multiarray.dragon4_scientific}
30720512 20.183 0.000 26.420 0.000 arrayprint.py:636(_extendLine)
10 12.281 1.228 12.281 1.228 {method 'sub' of '_sre.SRE_Pattern' objects}
60001 11.434 0.000 79.370 0.001 arrayprint.py:804(fillFormat)
228330011/228329975 10.270 0.000 10.270 0.000 {built-in method builtins.len}
204081 4.815 0.000 16.469 0.000 {built-in method builtins.max}
18431577 4.624 0.000 21.742 0.000 arrayprint.py:854(<genexpr>)
18431577 4.453 0.000 28.627 0.000 arrayprint.py:859(<genexpr>)
30720531 3.987 0.000 3.987 0.000 {method 'split' of 'str' objects}
12348936 3.012 0.000 13.873 0.000 arrayprint.py:829(<genexpr>)
12348936 3.007 0.000 17.955 0.000 arrayprint.py:832(<genexpr>)
18431577 2.179 0.000 2.941 0.000 arrayprint.py:863(<genexpr>)
18431577 2.124 0.000 2.870 0.000 arrayprint.py:864(<genexpr>)
12348936 1.625 0.000 3.180 0.000 arrayprint.py:833(<genexpr>)
12348936 1.468 0.000 1.992 0.000 arrayprint.py:834(<genexpr>)
12348936 1.433 0.000 1.922 0.000 arrayprint.py:844(<genexpr>)
12348936 1.432 0.000 1.929 0.000 arrayprint.py:837(<genexpr>)
12324864 1.074 0.000 1.074 0.000 {method 'partition' of 'str' objects}
6845518 0.761 0.000 0.761 0.000 {method 'rstrip' of 'str' objects}
60001 0.747 0.000 80.175 0.001 arrayprint.py:777(__init__)
2 0.637 0.319 245.563 122.782 debug.py:237(smart_repr)
120002 0.573 0.000 0.573 0.000 {method 'reduce' of 'numpy.ufunc' objects}
60001 0.421 0.000 231.153 0.004 arrayprint.py:436(_array2string)
60000 0.370 0.000 0.370 0.000 {method 'rand' of 'mtrand.RandomState' objects}
60000 0.303 0.000 232.641 0.004 arrayprint.py:1334(array_repr)
60001 0.274 0.000 232.208 0.004 arrayprint.py:465(array2string)
60001 0.261 0.000 80.780 0.001 arrayprint.py:367(_get_format_function)
120008 0.255 0.000 0.611 0.000 numeric.py:2460(seterr)
更新以清除问题
这是具有 bug的部分。幕后原因导致编程花费了太长时间。这与垃圾收集器有关,还是怪异的numpy错误?我没有任何线索。
t6 = time.time()
foo1(big_array, face) # 223.32ms
t7 = time.time()
print("foo1 : ", t7-t6) # foo1 : 170 seconds