Question

我试图加快计算球体体积的代码（参见下面的代码）。这个球体的体积是通过计算小体积片段dv，并将它们相加成一个体积来生成的。

实际上，在将计算应用于具有对称属性的其他球体之前，此代码只是一个健全性检查，因此我应该能够通过计算小体积并乘以结束来提高代码的速度结果

在while（phid＆lt; =（360.0 / adstep））和while（thetad＆lt; =（180.0 / adstep））中分别用180和90替换360和180并且你需要四分之一的计算意味着你可以简单地乘以最终vol 4.0。

如果我将phi设置为180并将θ保持为180，将计算减半，则此方法有效。但是当我将theta设置为90时，它并不喜欢它。

输出继电器：

Phi 360, Theta 180
Actual Volume       Calculated Volume   % Difference
4.18879020478639053 4.18878971565348923 0.00001167718922403

Phi 180, Theta 180
4.18879020478639053 4.18878971565618219 0.00001167712493440

Phi 180, Theta 90
4.18879020478639053 4.18586538829648180 0.06987363946500515

您可以在上面看到前两个计算几乎相同（我假设差异是由于精度误差），而最后一个计算得到的结果明显不同。嵌套循环会导致问题吗？

任何帮助都会受到赞赏，因为我没有在我的研究中发现任何内容（google＆amp; stack overflow）来描述我遇到的问题。

#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;

int main()
{
double thetar, phir, maxr, vol, dv, vol2, arstep, adstep, rad, rad3, thetad, phid, ntheta, nphi;
cout << fixed << setprecision(17); // Set output precision to defined number of decimal places. Note Double has up to 15 decimal place accuracy

vol=0.0;            // Initialise volume and set at zero
adstep=0.1;     // Steps to rotate angles in degrees
arstep=(adstep/180.0)*M_PI;     // Angle steps in radians
phid=1.0;           // Phi in degrees starting at adstep
maxr = 1.0;         // Radius of the sphere

// Loop to calculate volume

while (phid<=(360.0/adstep))            // Loop over Phi divided by adstep. This scales the loop to the desired number of calculations.
{
    phir=((phid*adstep)/180.0)*M_PI;        // Phi in radians
    thetad=1.0;                             // Theta in degrees, reset to initial adstep value
    while (thetad<=(180.0/adstep))          // Loop over Theta divided by adstep. Like Phi loop, this scales the loop to the desired number of calculations
    {
        thetar=((thetad*adstep)/180.0)*M_PI;    // Convert theta degrees to radians
        dv = ((maxr*maxr*maxr) * sin(thetar) * arstep * arstep) / 3.0;      // Volume of current segment
        vol += dv;      // Summing all the dv value together to generate a global volume
        thetad+=1.0;    // Increase theta (degrees) by a single step
    }
    phid+=1.0;      // Increase phi (degrees) by a single step
}

vol = vol*1.0; // Volume compensated for any reduction in phi and theta

rad3 = (3.0*vol)/(4.0*M_PI);    // volume equivalent radius^3
rad = pow(rad3,(1.0/3.0));      // volume equivalent radius
vol2 = (4.0/3.0)*M_PI*(maxr*maxr*maxr);     // Calculated volume of a sphere given initial maxr

// Diagnostic output
cout << vol2 << " " << vol << " " << ((vol2-vol)/vol)*100.0 << endl;

}

编辑：将phid和thetad的起始值更正为1.0

编辑2：我只想更新，对于未来的观众来说，使用Kahan求和算法（https://en.wikipedia.org/wiki/Kahan_summation_algorithm）几乎否定了我的所有精度误差，因为将一小部分加到一个大数字上。还有其他方法，但这是最简单的方法之一，我需要它做的工作。对于后代，这是从维基百科页面上获取的关于主题的示例psuedocode：

function KahanSum(input)
 var sum = 0.0
 var c = 0.0                  // A running compensation for lost low-order bits.
 for i = 1 to input.length do
    var y = input[i] - c     // So far, so good: c is zero.
    var t = sum + y          // Alas, sum is big, y small, so low-order digits of y are lost.
    c = (t - sum) - y // (t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
    sum = t           // Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers!
    // Next time around, the lost low part will be added to y in a fresh attempt.
 return sum

Answer 1

就速度而言，我怀疑（没有描述它）浪费了很多时间在弧度和度数之间进行转换，并且还计算了所有sin s。 AFAICT，thetar在外循环的每次迭代期间循环遍历相同的值，因此在主循环之前预先计算sin(thetar)一次可能更有效，并在内部执行简单查找循环。

至于数值稳定性，当vol越来越大于dv时，随着时间的推移，你将开始失去越来越多的精度。如果你可以将所有dv存储在一个数组中，然后使用分而治之的方法而不是线性传递来对它求和，原则上你会得到更好的结果。然后我再次计算（仅）6 480 000总迭代次数，所以我认为double累加器（保持15-17个显着的基数为10位）实际上可以处理丢失6-7位而没有太多麻烦。

Answer 2

最有可能的问题是：在您需要之前退出循环1迭代。您不应该将浮点数比较为相等。解决这个问题的一种快速方法是添加一个小常量，例如

while（thetad＆lt;（180.0 / adstep） + 1e-8 ）

Answer 3

这不是一个非常彻底的分析，但可能会让您深入了解错误的来源。在您的代码中，您正在累积3240000浮点数的值。随着vol的值增加，dv和vol之间的比率会增加，您在添加中会失去越来越多的精确度。

减少将多个值累积到单个值（称为减少总和）的精度损失的标准方法是在块中执行添加：例如，您可以将每个值相加8个值并将它们存储到一个数组中，然后将该数组的每8个值加在一起，等等，直到剩下一个值。这可能会让你获得更好的结果。

此外，值得考虑的是您在球面上进行线性步长，因此您不能均匀地对球体进行采样。这可能会影响您的最终结果。均匀采样球体的一种方法是在方位角phi中采用从0到360度的线性步长，并将极化角度acos的范围从{1}调整为1 {1} }。有关更详细的说明，请参阅this link on sphere point-picking。

Answer 4

首先，我认为，你的函数中有几个错误。我认为phid和thetad都应初始化为0或1.0。

其次，通过减少浮点乘法的数量可以获得相当多的收益。

在下面的代码中，我将main函数的内容移至volume1并创建了一个包含略微优化代码的函数volume2。

#include <iostream>
#include <iomanip>
#include <cmath>
#include <ctime>
using namespace std;

void volume1(int numSteps)
{
   double thetar, phir, maxr, vol, dv, vol2, arstep, adstep, rad, rad3, thetad, phid, ntheta, nphi;
   cout << fixed << setprecision(17); // Set output precision to defined number of decimal places. Note Double has up to 15 decimal place accuracy

   vol=0.0;            // Initialise volume and set at zero
   adstep=360.0/numSteps;     // Steps to rotate angles in degrees
   arstep=(adstep/180.0)*M_PI;     // Angle steps in radians
   phid=1.0;            // Phi in degrees starting at adstep
   maxr = 1.0;         // Radius of the sphere

   // Loop to calculate volume

   while (phid<=(360.0/adstep))            // Loop over Phi divided by adstep. This scales the loop to the desired number of calculations.
   {
      phir=((phid*adstep)/180.0)*M_PI;        // Phi in radians
      thetad=1.0;                              // Theta in degrees, reset to initial adstep value
      while (thetad<=(180.0/adstep))          // Loop over Theta divided by adstep. Like Phi loop, this scales the loop to the desired number of calculations
      {
         thetar=((thetad*adstep)/180.0)*M_PI;    // Convert theta degrees to radians
         dv = ((maxr*maxr*maxr) * sin(thetar) * arstep * arstep) / 3.0;      // Volume of current segment
         vol += dv;      // Summing all the dv value together to generate a global volume
         thetad+=1.0;    // Increase theta (degrees) by a single step
      }
      phid+=1.0;      // Increase phi (degrees) by a single step
   }

   vol = vol*1.0; // Volume compensated for any reduction in phi and theta

   rad3 = (3.0*vol)/(4.0*M_PI);    // volume equivalent radius^3
   rad = pow(rad3,(1.0/3.0));      // volume equivalent radius
   vol2 = (4.0/3.0)*M_PI*(maxr*maxr*maxr);     // Calculated volume of a sphere given initial maxr

   // Diagnostic output
   cout << vol2 << " " << vol << " " << ((vol2-vol)/vol)*100.0 << endl << endl;
}

void volume2(int numSteps)
{
   double thetar, maxr, vol, vol2, arstep, adstep, rad, rad3, thetad, phid, ntheta, nphi;
   cout << fixed << setprecision(17); // Set output precision to defined number of decimal places. Note Double has up to 15 decimal place accuracy

   vol=0.0;            // Initialise volume and set at zero
   adstep = 360.0/numSteps;
   arstep=(adstep/180.0)*M_PI;     // Angle steps in radians
   maxr = 1.0;         // Radius of the sphere
   double maxRCube = maxr*maxr*maxr;
   double arStepSquare = arstep*arstep;
   double multiplier = maxRCube*arStepSquare/3.0;

   // Loop to calculate volume

   int step = 1;
   for ( ; step <= numSteps; ++step )
   {
      int numInnerSteps = numSteps/2;
      thetad = adstep;                              // Theta in degrees, reset to initial adstep value
      for ( int innerStep = 1; innerStep <= numInnerSteps; ++innerStep )
      {
         thetar = innerStep*arstep;
         vol += multiplier * sin(thetar);          // Volume of current segment
      }
   }

   vol = vol*1.0; // Volume compensated for any reduction in phi and theta

   rad3 = (3.0*vol)/(4.0*M_PI);    // volume equivalent radius^3
   rad = pow(rad3,(1.0/3.0));      // volume equivalent radius
   vol2 = (4.0/3.0)*M_PI*(maxr*maxr*maxr);     // Calculated volume of a sphere given initial maxr

   // Diagnostic output
   cout << vol2 << " " << vol << " " << ((vol2-vol)/vol)*100.0 << endl << endl;
}

int main()
{
   int numSteps = 3600;
   clock_t start = clock();
   volume1(numSteps);
   clock_t end1 = clock();
   volume2(numSteps);
   clock_t end2 = clock();

   std::cout << "CPU time used: " << 1000.0 * (end1-start) / CLOCKS_PER_SEC << " ms\n";
   std::cout << "CPU time used: " << 1000.0 * (end2-end1) / CLOCKS_PER_SEC << " ms\n";
}

我得到的输出，使用g ++ 4.7.3：

4.18879020478639053 4.18762558892993564 0.02781088785811153

4.18879020478639053 4.18878914146923176 0.00002538483372773

CPU time used: 639.00000000000000000 ms
CPU time used: 359.00000000000000000 ms

这让你的成绩提高了约44％。

加速我的代码以获得一个球体积（循环嵌套）

4 个答案: