我正在尝试在java上实现Fast Inverse Square Root以加快向量规范化。但是,当我在Java中实现单精度版本时,我首先获得与1F / (float)Math.sqrt()
大致相同的速度,然后迅速降低到速度的一半。这很有意思,因为虽然Math.sqrt使用(我推测)一个本机方法,但这涉及浮点除法,我听说它实在很慢。我计算数字的代码如下:
public static float fastInverseSquareRoot(float x){
float xHalf = 0.5F * x;
int temp = Float.floatToRawIntBits(x);
temp = 0x5F3759DF - (temp >> 1);
float newX = Float.intBitsToFloat(temp);
newX = newX * (1.5F - xHalf * newX * newX);
return newX;
}
使用我编写的短程序,每次迭代1600万次,然后汇总结果,然后重复,我得到这样的结果:
1F / Math.sqrt() took 65209490 nanoseconds.
Fast Inverse Square Root took 65456128 nanoseconds.
Fast Inverse Square Root was 0.378224 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 64131293 nanoseconds.
Fast Inverse Square Root took 26214534 nanoseconds.
Fast Inverse Square Root was 59.123647 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 27312205 nanoseconds.
Fast Inverse Square Root took 56234714 nanoseconds.
Fast Inverse Square Root was 105.895914 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 26493281 nanoseconds.
Fast Inverse Square Root took 56004783 nanoseconds.
Fast Inverse Square Root was 111.392402 percent slower than 1F / Math.sqrt()
我始终获得两者速度大致相同的数字,然后进行迭代,快速反向平方根节省1F / Math.sqrt()
所需时间的约60%,然后进行几次迭代,大约需要两倍的时间用于快速反向平方根作为控件运行。我很困惑为什么FISR会来自Same - >快60% - >速度慢100%,每次运行程序时都会发生这种情况。
编辑:以上数据是我在eclipse中运行它的时候。当我使用javac/java
运行程序时,我得到完全不同的数据:
1F / Math.sqrt() took 57870498 nanoseconds.
Fast Inverse Square Root took 88206794 nanoseconds.
Fast Inverse Square Root was 52.421004 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 54982400 nanoseconds.
Fast Inverse Square Root took 83777562 nanoseconds.
Fast Inverse Square Root was 52.371599 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 21115822 nanoseconds.
Fast Inverse Square Root took 76705152 nanoseconds.
Fast Inverse Square Root was 263.259133 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 20159210 nanoseconds.
Fast Inverse Square Root took 80745616 nanoseconds.
Fast Inverse Square Root was 300.539585 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 21814675 nanoseconds.
Fast Inverse Square Root took 85261648 nanoseconds.
Fast Inverse Square Root was 290.845374 percent slower than 1F / Math.sqrt()
EDIT2:经过几次回复后,似乎速度在几次迭代后趋于稳定,但它稳定的数字是高度不稳定的。任何人都知道为什么?
这是我的代码(不完全简洁,但这是完整的事情):
public class FastInverseSquareRootTest {
public static FastInverseSquareRootTest conductTest() {
float result = 0F;
long startTime, endTime, midTime;
startTime = System.nanoTime();
for (float x = 1F; x < 4_000_000F; x += 0.25F) {
result = 1F / (float) Math.sqrt(x);
}
midTime = System.nanoTime();
for (float x = 1F; x < 4_000_000F; x += 0.25F) {
result = fastInverseSquareRoot(x);
}
endTime = System.nanoTime();
return new FastInverseSquareRootTest(midTime - startTime, endTime
- midTime);
}
public static float fastInverseSquareRoot(float x) {
float xHalf = 0.5F * x;
int temp = Float.floatToRawIntBits(x);
temp = 0x5F3759DF - (temp >> 1);
float newX = Float.intBitsToFloat(temp);
newX = newX * (1.5F - xHalf * newX * newX);
return newX;
}
public static void main(String[] args) throws Exception {
for (int i = 0; i < 7; i++) {
System.out.println(conductTest().toString());
}
}
private long controlDiff;
private long experimentalDiff;
private double percentError;
public FastInverseSquareRootTest(long controlDiff, long experimentalDiff) {
this.experimentalDiff = experimentalDiff;
this.controlDiff = controlDiff;
this.percentError = 100D * (experimentalDiff - controlDiff)
/ controlDiff;
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append(String.format("1F / Math.sqrt() took %d nanoseconds.%n",
controlDiff));
sb.append(String.format(
"Fast Inverse Square Root took %d nanoseconds.%n",
experimentalDiff));
sb.append(String
.format("Fast Inverse Square Root was %f percent %s than 1F / Math.sqrt()%n",
Math.abs(percentError), percentError > 0D ? "slower"
: "faster"));
return sb.toString();
}
}
答案 0 :(得分:11)
JIT优化器似乎已经将调用发送到Math.sqrt
。
使用未经修改的代码,我得到了
1F / Math.sqrt() took 65358495 nanoseconds.
Fast Inverse Square Root took 77152791 nanoseconds.
Fast Inverse Square Root was 18,045544 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 52872498 nanoseconds.
Fast Inverse Square Root took 75242075 nanoseconds.
Fast Inverse Square Root was 42,308531 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 23386359 nanoseconds.
Fast Inverse Square Root took 73532080 nanoseconds.
Fast Inverse Square Root was 214,422951 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 23790209 nanoseconds.
Fast Inverse Square Root took 76254902 nanoseconds.
Fast Inverse Square Root was 220,530610 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 23885467 nanoseconds.
Fast Inverse Square Root took 74869636 nanoseconds.
Fast Inverse Square Root was 213,452678 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 23473514 nanoseconds.
Fast Inverse Square Root took 73063699 nanoseconds.
Fast Inverse Square Root was 211,260168 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 23738564 nanoseconds.
Fast Inverse Square Root took 71917013 nanoseconds.
Fast Inverse Square Root was 202,954353 percent slower than 1F / Math.sqrt()
fastInverseSquareRoot
的时间一直较慢,而且时间都在同一个球场,而Math.sqrt
则会大幅加快。
更改代码,以便无法避免对Math.sqrt
的调用,
for (float x = 1F; x < 4_000_000F; x += 0.25F) {
result += 1F / (float) Math.sqrt(x);
}
midTime = System.nanoTime();
for (float x = 1F; x < 4_000_000F; x += 0.25F) {
result -= fastInverseSquareRoot(x);
}
endTime = System.nanoTime();
if (result == 0) System.out.println("Wow!");
我得到了
1F / Math.sqrt() took 184884684 nanoseconds.
Fast Inverse Square Root took 85298761 nanoseconds.
Fast Inverse Square Root was 53,863804 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 182183542 nanoseconds.
Fast Inverse Square Root took 83040574 nanoseconds.
Fast Inverse Square Root was 54,419278 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 165269658 nanoseconds.
Fast Inverse Square Root took 81922280 nanoseconds.
Fast Inverse Square Root was 50,431143 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 163272877 nanoseconds.
Fast Inverse Square Root took 81906141 nanoseconds.
Fast Inverse Square Root was 49,834815 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 165314846 nanoseconds.
Fast Inverse Square Root took 81124465 nanoseconds.
Fast Inverse Square Root was 50,927296 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 164079534 nanoseconds.
Fast Inverse Square Root took 80453629 nanoseconds.
Fast Inverse Square Root was 50,966689 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 162350821 nanoseconds.
Fast Inverse Square Root took 79854355 nanoseconds.
Fast Inverse Square Root was 50,813704 percent faster than 1F / Math.sqrt()
Math.sqrt
的 多次慢,而fastInverseSqrt
的时间只有中等速度(现在每次迭代都需要减法)。
答案 1 :(得分:0)
我发布的代码输出是:
1F / Math.sqrt() took 165769968 nanoseconds.
Fast Inverse Square Root took 251809517 nanoseconds.
Fast Inverse Square Root was 51.902977 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 162953919 nanoseconds.
Fast Inverse Square Root took 251212721 nanoseconds.
Fast Inverse Square Root was 54.161816 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 161524902 nanoseconds.
Fast Inverse Square Root took 36242909 nanoseconds.
Fast Inverse Square Root was 77.562030 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 162289014 nanoseconds.
Fast Inverse Square Root took 36552036 nanoseconds.
Fast Inverse Square Root was 77.477196 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 163157620 nanoseconds.
Fast Inverse Square Root took 36152720 nanoseconds.
Fast Inverse Square Root was 77.841844 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 162511997 nanoseconds.
Fast Inverse Square Root took 36426705 nanoseconds.
Fast Inverse Square Root was 77.585221 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 162302698 nanoseconds.
Fast Inverse Square Root took 36797410 nanoseconds.
Fast Inverse Square Root was 77.327912 percent faster than 1F / Math.sqrt()
似乎JIT被踢了,表演提升了近十倍。希望有更好地掌握JIT的人会来解释这一点。我的环境:Java 6,Eclipse。
答案 2 :(得分:0)
我的jit有两个加快步骤的步骤:第一步可能是算法优化,第二步可能是装配优化。
1F / Math.sqrt() took 78202645 nanoseconds.
Fast Inverse Square Root took 79248400 nanoseconds.
Fast Inverse Square Root was 1,337237 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 76856008 nanoseconds.
Fast Inverse Square Root took 24788247 nanoseconds.
Fast Inverse Square Root was 67,747158 percent faster than 1F / Math.sqrt()
1F / Math.sqrt() took 24162119 nanoseconds.
Fast Inverse Square Root took 70651968 nanoseconds.
Fast Inverse Square Root was 192,407996 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 24163301 nanoseconds.
Fast Inverse Square Root took 70598983 nanoseconds.
Fast Inverse Square Root was 192,174414 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 24201621 nanoseconds.
Fast Inverse Square Root took 70667344 nanoseconds.
Fast Inverse Square Root was 191,994259 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 24219835 nanoseconds.
Fast Inverse Square Root took 70698568 nanoseconds.
Fast Inverse Square Root was 191,903591 percent slower than 1F / Math.sqrt()
1F / Math.sqrt() took 24231663 nanoseconds.
Fast Inverse Square Root took 70633991 nanoseconds.
Fast Inverse Square Root was 191,494608 percent slower than 1F / Math.sqrt()