我想知道是否有更好的方法遍历numpy数组? 我已经对嵌套迭代进行了计时,每个循环大约需要40-50秒,我想知道是否有更快的方法?我知道循环遍历numpy数组不是理想的方法,但是我快要结束了!我浏览了关于stackoverflow的许多问题,但所有这些最终使我更加困惑。
我曾尝试使用tolist()
函数将numpy数组转换为列表,但是运行时间同样也更慢,甚至更糟。
def euc_distance(array1, array2):
return np.power(np.sum((array1 - array2)**2) , 0.5)
for i in range(N):
for j,n in enumerate(data2.values):
distance = euc_distance(n, D[i])
if distance < Dradius[i] and NormAttListTest[j] == "Attack":
TP += 1
我的euc_distance函数以数组形式(在我的情况下为5维)传递,以输出1维值。我的data2.values
是我通过pandas框架访问numpy数组的方式,该框架是[500 000,5]数据帧。
(请注意,NormAttListTest是一个在每个单独的测试数据上标记了“攻击”和“正常”类别数据的列表)
答案 0 :(得分:0)
您的问题是您以错误的方式使用了numpy
,因为numpy
与MATLAB
之类的矢量化计算有关。考虑对代码的以下修改。我用简单的numpy代码替换了numpy数组上的循环,该代码有效地利用了2d数组的向量化。结果,代码的运行速度提高了100倍。
import functools
import numpy as np
import time
# decorator to measure running time
def measure_running_time(echo=True):
def decorator(func):
@functools.wraps(func)
def wrapped(*args, **kwargs):
t_1 = time.time()
ans = func(*args, **kwargs)
t_2 = time.time()
if echo:
print(f'{func.__name__}() running time is {t_2 - t_1:.2f} s')
return ans
return wrapped
return decorator
def euc_distance(array1, array2):
return np.power(np.sum((array1 - array2) ** 2), 0.5)
# original function
@measure_running_time()
def calculate_TP_1(N, data2, D, Dradius, NormAttListTest, TP=0):
for i in range(N):
for j, n in enumerate(data2):
distance = euc_distance(n, D[i])
if distance < Dradius[i] and NormAttListTest[j] == "Attack":
TP += 1
return TP
# new version
@measure_running_time()
def calculate_TP_2(N, data2, D, Dradius, NormAttListTest, TP=0):
# this condition is the same for every i value
NormAttListTest = np.array([val == 'Attack' for val in NormAttListTest])
for i in range(N):
# don't use loop over numpy arrays
# compute distance for all the rows
distance = np.sum((data2 - D[i]) ** 2, axis=1) ** .5
# check conditions for all the row
TP += np.sum((distance < Dradius[i]) & (NormAttListTest))
return TP
if __name__ == '__main__':
N = 10
NN = 100_000
D = np.random.randint(0, 10, (N, 5))
Dradius = np.random.randint(0, 10, (N,))
NormAttListTest = ['Attack'] * NN
NormAttListTest[:NN // 2] = ['Defence'] * (NN // 2)
data2 = np.random.randint(0, 10, (NN, 5))
print(calculate_TP_1(N, data2, D, Dradius, NormAttListTest))
print(calculate_TP_2(N, data2, D, Dradius, NormAttListTest))
输出:
calculate_TP_1() running time is 7.24 s
96476
calculate_TP_2() running time is 0.06 s
96476