Question

在2d平面上，有一个大圆以（0,0）为中心，半径为。它包围〜100米左右较小的圆穿过母体圆随机分布以其他方式与已知的半径和位置相对于所述原点。（有可能一些较小的子圆部分或全部位于某些较大的子圆内。）

整个平面均匀地分成像素，其侧面为水平和垂直方向（沿坐标轴）。像素的大小是固定的，并且是先验的，但是比父圆的大小小得多；整个父圆上大约有1000个 special 像素。我们为所有这些特殊网格（的中心）提供了二维笛卡尔坐标。那些子圈围封这些特殊网格中的至少一个被命名为*特殊的”子圈供以后使用。

现在，假设所有这些三维空间填充有〜亿个颗粒。我的代码尝试将这些粒子加到每个特殊子圆中。

我设法调试我的代码，但它似乎是，当我处理的颗粒的这样一个伟大的数字，这是非常缓慢，因为写在下面。我想看看是否可以使用任何技巧至少将其加快一个数量级。

.
.
.
for x, y in zip(vals1, vals2):  # vals1, vals2 are the 2d position array of the *special* grids each with a 1d array of size ~1000
    enclosing_circles, sub_circle_catalog, some_parameter_catalog, totals = {}, [], [], {}


    for id, mass in zip(ids_data, masss_data): # These two arrays are equal in size equal to an array of size ~100,000,000
        rule1 = some_condition           # this check if each special grid is within each circle
        rule2 = some_other_condition     # this makes sure that we are only concerned with those circles larger than some threshold size 

        if (rule1 and rule2):
            calculated_property = some_function

            if condition_3:
                calculated_some_other_property = some_other_function

                if condition_4:
                    some_quantity = something
                    enclosing_circles[id] = float('{:.4f}'.format(log10(mass)))
                    some_parameter[id] = float('{:.3e}'.format(some_quantity))


    # choose all sub-circles' IDs enclosing the special pixel
    enclosing_circles_list = list(enclosing_circles.keys())
    some_parameter_list = list(some_parameter.keys())
    sub_circle_catalog += [(enclosing_circles[i], 1) for i in enclosing_circles_list]
    some_parameter_catalog += [(enclosing_circles[i], some_parameter[j]) for i, j in zip(enclosing_circles_list, some_parameter_list)]

# add up all special grids in each sub-circle when looping over all grids
for key, value in sub_circle_catalog:
    totals[key] = totals.get(key, 0) + value
totals_dict = collections.OrderedDict(sorted(totals.items()))
totals_list = list(totals.items())


with open(some_file_path, "a") as some_file:
    print('{}'.format(totals_list), file=some_file)
    some_file.close()
.
.
.

Answer 1

规则1和第二下规则2为正在服用的时间最长。

内联rule1和rule2。如果and知道第一部分为假，则不会评估第二部分。还可以尝试交换它们，看看效果是否更好。

根据这些规则的计算方式的详细信息，您可能还会找到其他类似快捷方式的机会。

始终进行概要分析以查找瓶颈。您可能会浪费大量时间来优化其他方面的功能，而这并没有太大帮助。

在可能的情况下使用快捷方式；不要浪费时间计算不需要的东西。

通过内联避免嵌套循环中的函数调用。在CPython中，调用有点慢。

展开内部循环以减少循环开销。

外循环的只要有可能不是重做每个循环计算的东西。

考虑编译具有Nutika，用Cython，或PyPy整个事情。（或者只是使用Cython或Numba的慢声部分。）

考虑重写这一部分在利亚，这更快地并且容易地从Python的调用。最好提取并调用整个内部循环而不是仅调用其主体，以避免每个循环的调用开销。

考虑与numpy的向量化计算在可能情况下，即使是只在循环的一部分。 NumPy的内部循环是比Python的要快得多。这会占用更多内存。如果可以执行numpy向量化，则可以使用使用GPU的CuPy或可以处理更大数据集的Dask来提高速度。

加快嵌套如果下循环for循环

1 个答案: