Question

在Monte-Carlo模拟中，我以import numpy as np def create_coordinates_vect(dimensions=[1500,2500], length=50, count=12000000, type1_content=0.001): # two arrays with random start coordinates in area of dimensions x0 = np.random.randint(dimensions[0], size=count) y0 = np.random.randint(dimensions[1], size=count) # random direction of each stick dirrad = 2 * np.pi * np.random.rand(count) # to destinguish between type1 and type2 sticks based on random values stick_type = np.random.rand(count) is_type1 = np.zeros_like(stick_type) is_type1[stick_type < type1_content] = True # calculate end coordinates x1 = x0 + np.rint(np.cos(dirrad) * length).astype(np.int32) y1 = y0 + np.rint(np.sin(dirrad) * length).astype(np.int32) # stack together start and end coordinates coordinates = np.vstack((x0, y0, x1, y1)).T.astype(np.int32) # split array according to type coords_type1 = coordinates[is_type1 == True] coords_type2 = coordinates[is_type1 == False] return ([coords_type1, coords_type2]) list1, list2 = create_coordinates_vect()的形式创建了许多随机棒坐标列表（实际上每次重复表示两种不同棒类型的两个坐标列表）。通过使用矢量化numpy方法，我尝试最小化创建时间。然而，对于某些条件，阵列的长度超过10十亿，并且这一代成为瓶颈。

以下代码给出了一些带有一些测试值的最小示例

=> x0, y0:                       477.3640632629945 ms
=> dirrad, stick_type:           317.4648284911094 ms
=> is_type1:                      27.3699760437172 ms
=> x1, y1:                      1184.7038269042969 ms
=> vstack:                       189.0783309965234 ms
=> coords_type1, coords_type2:   309.9758625035176 ms

时序分析给出了不同部分的以下结果

public List<LambdaExpression> defaultSortExpressions { get; set; }

private IQueryable<TEntity> orderEntries(IQueryable<TEntity> entries)
{
    var n = 0;
    foreach (var sortExpression in defaultSortExpressions)
    {
        if (n == 0)
        {
            entries = Queryable.OrderBy(entries, (dynamic)sortExpression);
        }
        else
        {
            entries = Queryable.ThenBy(entries, (dynamic)sortExpression);
        }
        n = n + 1;
    }
    return entries;
}

我仍然可以通过事先定义type1和type2的数量来获得一些时间，而不是为每个棒做一些随机数比较。然而，创建随机起始坐标和方向以及结束坐标的计算的较长部分将保持不变。

有人看到进一步优化以加速阵列的创建吗？

Answer 1

由于时间表明x1＆amp; y1计算是代码中最慢的部分。在其中，我们进行cosine和sine次计算，使用length进行缩放，然后舍入并转换为int32。现在，人们用来提升NumPy性能的方法之一就是使用numexpr模块。

在我们最慢的部分，可以使用numexpr计算的操作是sine，cosine和缩放。因此，代码的numexpr修改版本看起来像这样 -

import numexpr as ne

x1 = x0 + np.rint(ne.evaluate("cos(dirrad) * length")).astype(np.int32)
y1 = y0 + np.rint(ne.evaluate("sin(dirrad) * length")).astype(np.int32)

运行时测试 -

让我们考虑原始数组形状的(1/100)形状。因此，我们有 -

dimensions=[15,25]
length=50
count=120000
type1_content=0.001

代码的初始部分保持不变 -

# two arrays with random start coordinates in area of dimensions
x0 = np.random.randint(dimensions[0], size=count)
y0 = np.random.randint(dimensions[1], size=count)

# random direction of each stick
dirrad = 2 * np.pi * np.random.rand(count)
# to destinguish between type1 and type2 sticks based on random values
stick_type = np.random.rand(count)   
is_type1 = np.zeros_like(stick_type)
is_type1[stick_type < type1_content] = True

接下来，我们有两个用于运行时测试目的的brances - 一个使用原始代码，另一个使用基于numexpr建议的方法 -

def org_app(x0,y0,dirrad,length):
    x1 = x0 + np.rint(np.cos(dirrad) * length).astype(np.int32)
    y1 = y0 + np.rint(np.sin(dirrad) * length).astype(np.int32)

def new_app(x0,y0,dirrad,length):
    x1 = x0 + np.rint(ne.evaluate("cos(dirrad) * length")).astype(np.int32)
    y1 = y0 + np.rint(ne.evaluate("sin(dirrad) * length")).astype(np.int32)

最后，运行时测试本身 -

In [149]: %timeit org_app(x0,y0,dirrad,length)
10 loops, best of 3: 23.5 ms per loop

In [150]: %timeit new_app(x0,y0,dirrad,length)
100 loops, best of 3: 14.6 ms per loop

所以，我们正在考虑运行时 40% 减少，我猜错了！

用numpy创建许多随机棒坐标的节省时间的方法

1 个答案: