我正在研究一个优化的java库,我想知道是否有像
这样的东西 int rX = rhsOffset;
int rY = rhsOffset + 1;
int rZ = rhsOffset + 2;
int rW = rhsOffset + 3;
其中局部变量rX是冗余的,但使代码更加可读。在这种情况下,rX是在Java字节代码还是JIT执行时编译出来的?
我也见过图书馆
m[offset + 0] = f / aspect;
m[offset + 1] = 0.0f;
m[offset + 2] = 0.0f;
m[offset + 3] = 0.0f;
其中" + 0"这样做是为了改善代码的外观。
我想要做同样的事情,但我想确保我不会伤害到表现。我不知道有什么好方法可以确定是否分配了内存或者是否为这些情况的以太处理了数学。在Android Studio中,您可以使用内存分析器,它允许您捕获所有分配并检查它们,但IntelliJ似乎没有提供该功能,我假设我不能依赖任何优化机器人构建系统要做到普通(非Android)Java项目。
答案 0 :(得分:1)
我编写了一些代码,以便通过实验进行调查,请参阅my repository on Github。
<强>摘要强>: 我在我的64位Ubuntu计算机上用Oracle JDK 9进行了一些实验。据我所知,通过这些特殊的实验,(i)冗余变量似乎不会影响实际的运行时 (ii)是否添加冗余0似乎并不重要。我的建议是不要担心你提到的那种性能问题,即时编译器对于这些类似的东西可能足够聪明,而糟糕的性能可能永远不会成为问题。
对于第一个问题,我使用Oracle JDK 9 javac编译器和可嵌入的Janino编译器进行了实验。我得到了类似的结果,表明可能大多数优化都是由JIT执行的。
我建议您使用您认为具有代表性的玩具示例在您的JVM上进行自己的实验。或直接在您的实际代码中进行衡量,以防糟糕的性能成为问题。
下面是我的实验细节。
问题1:引入冗余变量会影响执行时间吗?
我引入了一个参数,让它称之为 n ,它控制冗余分配的程度,并编写了一个代码生成器,它将为无意义的计算生成代码并引入冗余分配关于 n 的价值。例如,对于 n = 0 ,它会产生以下代码:
public static double eval0(double[] X, double[] Y) {
double sum = 0.0;
assert(X.length == Y.length);
int iters = X.length/3;
for (int i = 0; i < iters; i++) {
int at = 3*i;
double x0 = X[at + 0];
double x1 = X[at + 1];
double x2 = X[at + 2];
double y0 = Y[at + 0];
double y1 = Y[at + 1];
double y2 = Y[at + 2];
double x1y2 = x1*y2;
double x2y1 = x2*y1;
double a = x1y2-x2y1;
double x2y0 = x2*y0;
double x0y2 = x0*y2;
double b = x2y0-x0y2;
double x0y1 = x0*y1;
double x1y0 = x1*y0;
double c = x0y1-x1y0;
sum += a + b + c;
}
return sum;
}
并且,例如 n = 3 它会生成此代码:
public static double eval3(double[] X, double[] Y) {
double sum = 0.0;
assert(X.length == Y.length);
int iters = X.length/3;
for (int i = 0; i < iters; i++) {
int at = 3*i;
double x0 = X[at + 0];
double x1 = X[at + 1];
double x2 = X[at + 2];
double y0 = Y[at + 0];
double y1 = Y[at + 1];
double y2 = Y[at + 2];
double x1y2_28 = x1*y2;
double x1y2_29 = x1y2_28;
double x1y2_30 = x1y2_29;
double x1y2 = x1y2_30;
double x2y1_31 = x2*y1;
double x2y1_32 = x2y1_31;
double x2y1_33 = x2y1_32;
double x2y1 = x2y1_33;
double a_34 = x1y2-x2y1;
double a_35 = a_34;
double a_36 = a_35;
double a = a_36;
double x2y0_37 = x2*y0;
double x2y0_38 = x2y0_37;
double x2y0_39 = x2y0_38;
double x2y0 = x2y0_39;
double x0y2_40 = x0*y2;
double x0y2_41 = x0y2_40;
double x0y2_42 = x0y2_41;
double x0y2 = x0y2_42;
double b_43 = x2y0-x0y2;
double b_44 = b_43;
double b_45 = b_44;
double b = b_45;
double x0y1_46 = x0*y1;
double x0y1_47 = x0y1_46;
double x0y1_48 = x0y1_47;
double x0y1 = x0y1_48;
double x1y0_49 = x1*y0;
double x1y0_50 = x1y0_49;
double x1y0_51 = x1y0_50;
double x1y0 = x1y0_51;
double c_52 = x0y1-x1y0;
double c_53 = c_52;
double c_54 = c_53;
double c = c_54;
sum += a + b + c;
}
return sum;
}
这两个函数执行完全相同的计算,但其中一个具有更多冗余分配。最后,我还生成了一个调度函数:
public double eval(int n, double[] X, double[] Y) {
switch (n) {
case 0: return eval0(X, Y);
case 1: return eval1(X, Y);
case 2: return eval2(X, Y);
case 3: return eval3(X, Y);
case 4: return eval4(X, Y);
case 5: return eval5(X, Y);
case 8: return eval8(X, Y);
case 11: return eval11(X, Y);
case 15: return eval15(X, Y);
case 21: return eval21(X, Y);
case 29: return eval29(X, Y);
case 40: return eval40(X, Y);
case 57: return eval57(X, Y);
case 79: return eval79(X, Y);
case 111: return eval111(X, Y);
case 156: return eval156(X, Y);
case 218: return eval218(X, Y);
case 305: return eval305(X, Y);
}
assert(false);
return -1;
}
所有生成的代码都在我的仓库here上。
然后,我将所有这些函数的基准测试用于填充了随机数据的大小为10000的X和Y数组上的 n 的不同值。我使用Oracle JDK 9 javac 编译器和嵌入式Janino编译器完成了这项工作。我的基准测试代码也让JIT热身了一点。运行基准测试会产生此输出:
------ USING JAVAC
n = 0
"Elapsed time: 0.067189 msecs"
Result= -9.434172113697462
n = 1
"Elapsed time: 0.05514 msecs"
Result= -9.434172113697462
n = 2
"Elapsed time: 0.04627 msecs"
Result= -9.434172113697462
n = 3
"Elapsed time: 0.041316 msecs"
Result= -9.434172113697462
n = 4
"Elapsed time: 0.038673 msecs"
Result= -9.434172113697462
n = 5
"Elapsed time: 0.036372 msecs"
Result= -9.434172113697462
n = 8
"Elapsed time: 0.203788 msecs"
Result= -9.434172113697462
n = 11
"Elapsed time: 0.031491 msecs"
Result= -9.434172113697462
n = 15
"Elapsed time: 0.032673 msecs"
Result= -9.434172113697462
n = 21
"Elapsed time: 0.030722 msecs"
Result= -9.434172113697462
n = 29
"Elapsed time: 0.039271 msecs"
Result= -9.434172113697462
n = 40
"Elapsed time: 0.030785 msecs"
Result= -9.434172113697462
n = 57
"Elapsed time: 0.032382 msecs"
Result= -9.434172113697462
n = 79
"Elapsed time: 0.033021 msecs"
Result= -9.434172113697462
n = 111
"Elapsed time: 0.029978 msecs"
Result= -9.434172113697462
n = 156
"Elapsed time: 18.003687 msecs"
Result= -9.434172113697462
n = 218
"Elapsed time: 24.163828 msecs"
Result= -9.434172113697462
n = 305
"Elapsed time: 33.479853 msecs"
Result= -9.434172113697462
------ USING JANINO
n = 0
"Elapsed time: 0.032084 msecs"
Result= -9.434172113697462
n = 1
"Elapsed time: 0.032022 msecs"
Result= -9.434172113697462
n = 2
"Elapsed time: 0.029989 msecs"
Result= -9.434172113697462
n = 3
"Elapsed time: 0.034251 msecs"
Result= -9.434172113697462
n = 4
"Elapsed time: 0.030606 msecs"
Result= -9.434172113697462
n = 5
"Elapsed time: 0.030186 msecs"
Result= -9.434172113697462
n = 8
"Elapsed time: 0.032132 msecs"
Result= -9.434172113697462
n = 11
"Elapsed time: 0.030109 msecs"
Result= -9.434172113697462
n = 15
"Elapsed time: 0.031009 msecs"
Result= -9.434172113697462
n = 21
"Elapsed time: 0.032625 msecs"
Result= -9.434172113697462
n = 29
"Elapsed time: 0.031489 msecs"
Result= -9.434172113697462
n = 40
"Elapsed time: 0.030665 msecs"
Result= -9.434172113697462
n = 57
"Elapsed time: 0.03146 msecs"
Result= -9.434172113697462
n = 79
"Elapsed time: 0.031599 msecs"
Result= -9.434172113697462
n = 111
"Elapsed time: 0.029998 msecs"
Result= -9.434172113697462
n = 156
"Elapsed time: 17.579771 msecs"
Result= -9.434172113697462
n = 218
"Elapsed time: 24.561065 msecs"
Result= -9.434172113697462
n = 305
"Elapsed time: 33.357928 msecs"
Result= -9.434172113697462
从上面的输出看来,似乎javac和Janino都产生了同样高性能的代码,而对于 n 的低值,这个值似乎并不重要。但是,在 n = 156 时,我们观察到运行时间的急剧增加。我不知道为什么会这样,但我怀疑它与JVM上限制的局部变量的数量有关,因此Java编译器(javac / Janino)必须使用解决方法来克服该限制。而这些变通办法对于JIT的优化更加困难(这是我怀疑的,但也许有人可以对此有所了解......)。
问题2:冗余添加0会影响性能吗?
我写了一堂课来试验。该类有两个静态方法,它们都执行完全相同的计算,但对于apply0,我们在计算数组索引时也加0:
public class Mul2d {
public static double[] apply0(double angle, double[] X) {
int n = X.length/2;
double[] Y = new double[2*n];
double cosv = Math.cos(angle);
double sinv = Math.sin(angle);
for (int i = 0; i < n; i++) {
int at = 2*i;
Y[at + 0] = cosv*X[at + 0] - sinv*X[at + 1];
Y[at + 1] = sinv*X[at + 0] + cosv*X[at + 1];
}
return Y;
}
public static double[] apply(double angle, double[] X) {
int n = X.length/2;
double[] Y = new double[2*n];
double cosv = Math.cos(angle);
double sinv = Math.sin(angle);
for (int i = 0; i < n; i++) {
int at = 2*i;
Y[at] = cosv*X[at] - sinv*X[at + 1];
Y[at + 1] = sinv*X[at] + cosv*X[at + 1];
}
return Y;
}
}
在大型数组上运行基准测试表明无论是否添加0都无关紧要。以下是基准测试的输出:
With adding '+ 0'
"Elapsed time: 0.247315 msecs"
"Elapsed time: 0.235471 msecs"
"Elapsed time: 0.240675 msecs"
"Elapsed time: 0.251799 msecs"
"Elapsed time: 0.267139 msecs"
"Elapsed time: 0.250735 msecs"
"Elapsed time: 0.251697 msecs"
"Elapsed time: 0.238652 msecs"
"Elapsed time: 0.24872 msecs"
"Elapsed time: 1.274368 msecs"
Without adding '+ 0'
"Elapsed time: 0.239371 msecs"
"Elapsed time: 0.233459 msecs"
"Elapsed time: 0.228619 msecs"
"Elapsed time: 0.389649 msecs"
"Elapsed time: 0.238742 msecs"
"Elapsed time: 0.23459 msecs"
"Elapsed time: 0.23452 msecs"
"Elapsed time: 0.241013 msecs"
"Elapsed time: 0.356035 msecs"
"Elapsed time: 0.260892 msecs"
运行时看起来几乎相同,任何差异似乎都淹没在噪音中。
<强>结论:强> 关于问题1,我不能发现对这个特定玩具问题的性能有任何负面影响。
关于问题2,你是否添加+ 0似乎并不重要。除非JIT优化掉+0,否则循环中的其他计算很可能占据总时间,这意味着添加+0的任何额外的小成本都会淹没在噪声中。