Question

我在python中有一个简单的光线跟踪器。渲染图像200x200需要4分钟，这对我来说绝对太过分了。我想改善这种情况。

有些观点：我为每个像素拍摄多条光线（提供抗锯齿），每个像素总共拍摄16条光线。 200x200x16总计640000条光线。必须测试每条光线对场景中多个Sphere对象的影响。雷也是一个相当琐碎的对象

class Ray(object):
    def __init__(self, origin, direction):
        self.origin = numpy.array(origin)
        self.direction = numpy.array(direction)

Sphere稍微复杂一些，并且带有hit / nohit的逻辑：

class Sphere(object):
    def __init__(self, center, radius, color):
        self.center = numpy.array(center)
        self.radius = numpy.array(radius)
        self.color = color

    @profile 
    def hit(self, ray):
        temp = ray.origin - self.center
        a = numpy.dot(ray.direction, ray.direction)
        b = 2.0 * numpy.dot(temp, ray.direction)
        c = numpy.dot(temp, temp) - self.radius * self.radius
        disc = b * b - 4.0 * a * c

        if (disc < 0.0):
            return None
        else:
            e = math.sqrt(disc)
            denom = 2.0 * a
            t = (-b - e) / denom 
            if (t > 1.0e-7):
                normal = (temp + t * ray.direction) / self.radius
                hit_point = ray.origin + t * ray.direction
                return ShadeRecord.ShadeRecord(normal=normal, 
                                               hit_point=hit_point, 
                                               parameter=t, 
                                               color=self.color)           

            t = (-b + e) / denom

            if (t > 1.0e-7):
                normal = (temp + t * ray.direction) / self.radius                hit_point = ray.origin + t * ray.direction
                return ShadeRecord.ShadeRecord(normal=normal, 
                                               hit_point=hit_point, 
                                               parameter=t, 
                                               color=self.color)       

        return None

现在，我进行了一些分析，看起来最长的处理时间是在hit（）函数中

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2560000  118.831    0.000  152.701    0.000 raytrace/objects/Sphere.py:12(hit)
  1960020   42.989    0.000   42.989    0.000 {numpy.core.multiarray.array}
        1   34.566   34.566  285.829  285.829 raytrace/World.py:25(render)
  7680000   33.796    0.000   33.796    0.000 {numpy.core._dotblas.dot}
  2560000   11.124    0.000  163.825    0.000 raytrace/World.py:63(f)
   640000   10.132    0.000  189.411    0.000 raytrace/World.py:62(hit_bare_bones_object)
   640023    6.556    0.000  170.388    0.000 {map}

这并不让我感到惊讶，我希望尽可能减少这个值。我转到行分析，结果是

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    12                                               @profile
    13                                               def hit(self, ray):
    14   2560000     27956358     10.9     19.2          temp = ray.origin - self.center
    15   2560000     17944912      7.0     12.3          a = numpy.dot(ray.direction, ray.direction)
    16   2560000     24132737      9.4     16.5          b = 2.0 * numpy.dot(temp, ray.direction)
    17   2560000     37113811     14.5     25.4          c = numpy.dot(temp, temp) - self.radius * self.radius
    18   2560000     20808930      8.1     14.3          disc = b * b - 4.0 * a * c
    19                                                   
    20   2560000     10963318      4.3      7.5          if (disc < 0.0):
    21   2539908      5403624      2.1      3.7              return None
    22                                                   else:
    23     20092        75076      3.7      0.1              e = math.sqrt(disc)
    24     20092       104950      5.2      0.1              denom = 2.0 * a
    25     20092       115956      5.8      0.1              t = (-b - e) / denom
    26     20092        83382      4.2      0.1              if (t > 1.0e-7):
    27     20092       525272     26.1      0.4                  normal = (temp + t * ray.direction) / self.radius
    28     20092       333879     16.6      0.2                  hit_point = ray.origin + t * ray.direction
    29     20092       299494     14.9      0.2                  return ShadeRecord.ShadeRecord(normal=normal, hit_point=hit_point, parameter=t, color=self.color)

所以，似乎大部分时间花在了这段代码上：

        temp = ray.origin - self.center
        a = numpy.dot(ray.direction, ray.direction)
        b = 2.0 * numpy.dot(temp, ray.direction)
        c = numpy.dot(temp, temp) - self.radius * self.radius
        disc = b * b - 4.0 * a * c

我真的没有看到很多要优化的地方。您是否知道如何在不使用C的情况下使此代码更高效？

Answer 1

查看您的代码，看起来您的主要问题是您的代码行被调用了2560000次。无论您在该代码中执行什么样的工作，这都会花费很多时间。但是，使用numpy，您可以将很多此工作聚合成少量的numpy调用。

要做的第一件事就是将你的光线组合成大数组。不使用具有1x3向量作为原点和方向的Ray对象，而是使用具有命中检测所需的所有光线的Nx3阵列。点击功能的顶部最终会如下所示：

temp = rays.origin - self.center
b = 2.0 * numpy.sum(temp * rays.direction,1)
c = numpy.sum(numpy.square(temp), 1) - self.radius * self.radius
disc = b * b - 4.0 * c

对于下一部分，您可以使用

possible_hits = numpy.where(disc >= 0.0)
a = a[possible_hits]
disc = disc[possible_hits]
...

继续仅通过判别式测试的值。您可以通过这种方式轻松获得数量级的性能提升。

Answer 2

1）光线跟踪很有趣但是如果你完全关心性能，请转储python并切换到C.不是C ++，除非你是某种超级专家，只是C.

2）具有多个（20个或更多）对象的场景中的最大胜利是使用空间索引来减少交叉点测试的数量。流行的选择是kD-trees，OctTrees，AABB。

3）如果你是认真的，请查看ompf.org - 这是它的资源。

4）不要在python上询问有关优化的问题 - 大多数人每秒可以通过一个拥有10万个三角形的室内场景拍摄1百万到2百万条光线......每个核心。

我喜欢Python和光线追踪，但绝不会考虑将它们组合在一起。在这种情况下，正确的优化是切换语言。

Answer 3

使用这样的代码，您可以将常见的子表达式（例如self.radius * self.radius，self.radius2和1 / self.radius）记为self.one_over_radius。 python解释器的开销可能会主导这些微不足道的改进。

Answer 4

一个小优化：a和b * b始终为正，因此如果disc < 0.0，则(c > 0 && b < min(a, c))为真。在这种情况下，您可以避免计算b * b - 4.0 * a * c。鉴于您所做的配置文件，我最多只能节省7％的运行时间。

Answer 5

您最好的选择是尽可能使用查找表和预先计算的值。

由于您对我的评论的回复表明您的光线方向向量是单位向量，因此在您列出的关键部分中，您可以立即进行至少一次优化。任何矢量点本身都是长度平方，因此单位矢量点本身将始终为1.

此外，预先计算半径平方（在球体的__init__函数中）。

然后你得到了：

temp = ray.origin - self.center
a = 1 # or skip this and optimize out later
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius_squared
disc = b * b - 4.0 * c

temp dot temp会给你相当于sum( map( lambda component: component*component, temp ) ) ...我不确定哪个更快。

提高光线追踪功能的性能

5 个答案: