我的物理引擎一直困扰着一个非常棘手的问题,而且我很难同时计算数千个碰撞。我已经优化了负责告诉我碰撞的方法,它创建了0个对象,此时只是乘法/比较;但它还不够快!
*注意:请不要扯我物理引擎结构,我的项目目前将物理学添加到Minecraft,这是一个由数百万个立方体组成的游戏。可以想象,这会给这样的模拟创造一些独特的挑战.-
在上下文中,Polygon是一个包含8个向量的数组; Vector只是一个向量... 2个向量的点积是(v1.x v2.x + v1.y v2.y + v1.z * v2.z)。无论如何,这里的代码占用了所有处理时间的10%!
public class ReusableCollisionObject{
public boolean seperated;
public double movMaxFixMin,movMinFixMax;
private static double maxPlayer,minPlayer,maxBlock,minBlock,dot;
public void generateCollision(Polygon movable_,Polygon stationary,Vector axes){
maxPlayer = minPlayer = axes.dot(movable_.vertices[0]);
dot = axes.dot(movable_.vertices[1]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
dot = axes.dot(movable_.vertices[2]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
dot = axes.dot(movable_.vertices[3]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
dot = axes.dot(movable_.vertices[4]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
dot = axes.dot(movable_.vertices[5]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
dot = axes.dot(movable_.vertices[6]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
dot = axes.dot(movable_.vertices[7]);
if(dot>maxPlayer){
maxPlayer = dot;
}
if(dot<minPlayer){
minPlayer = dot;
}
maxBlock = minBlock = axes.dot(stationary.vertices[0]);
dot = axes.dot(stationary.vertices[1]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
dot = axes.dot(stationary.vertices[2]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
dot = axes.dot(stationary.vertices[3]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
dot = axes.dot(stationary.vertices[4]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
dot = axes.dot(stationary.vertices[5]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
dot = axes.dot(stationary.vertices[6]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
dot = axes.dot(stationary.vertices[7]);
if(dot>maxBlock){
maxBlock = dot;
}
if(dot<minBlock){
minBlock = dot;
}
seperated = minPlayer>maxBlock||maxPlayer<minBlock;
}
}
甚至可以让像这样的原始数学运行得更快吗?
编辑:感谢我得到的答案,我重新组织了性能操作并将所有双打转换为Floats。这是新的,更优化的课程。
public class ReusableCollisionObject{
public boolean seperated;
public double movMaxFixMin,movMinFixMax;
private static double maxPlayer,minPlayer,maxBlock,minBlock;
private static final float[] cachemovable_ = new float[16];
public void generateCollision(Polygon movable_,Polygon stationary,Vector axes){
cachemovable_[0] = axes.X*movable_.vertices[0].X+axes.Y*movable_.vertices[0].Y+axes.Z*movable_.vertices[0].Z;
cachemovable_[1] = axes.X*movable_.vertices[1].X+axes.Y*movable_.vertices[1].Y+axes.Z*movable_.vertices[1].Z;
cachemovable_[2] = axes.X*movable_.vertices[2].X+axes.Y*movable_.vertices[2].Y+axes.Z*movable_.vertices[2].Z;
cachemovable_[3] = axes.X*movable_.vertices[3].X+axes.Y*movable_.vertices[3].Y+axes.Z*movable_.vertices[3].Z;
cachemovable_[4] = axes.X*movable_.vertices[4].X+axes.Y*movable_.vertices[4].Y+axes.Z*movable_.vertices[4].Z;
cachemovable_[5] = axes.X*movable_.vertices[5].X+axes.Y*movable_.vertices[5].Y+axes.Z*movable_.vertices[5].Z;
cachemovable_[6] = axes.X*movable_.vertices[6].X+axes.Y*movable_.vertices[6].Y+axes.Z*movable_.vertices[6].Z;
cachemovable_[7] = axes.X*movable_.vertices[7].X+axes.Y*movable_.vertices[7].Y+axes.Z*movable_.vertices[7].Z;
cachemovable_[8] = axes.X*stationary.vertices[0].X+axes.Y*stationary.vertices[0].Y+axes.Z*stationary.vertices[0].Z;
cachemovable_[9] = axes.X*stationary.vertices[1].X+axes.Y*stationary.vertices[1].Y+axes.Z*stationary.vertices[1].Z;
cachemovable_[10] = axes.X*stationary.vertices[2].X+axes.Y*stationary.vertices[2].Y+axes.Z*stationary.vertices[2].Z;
cachemovable_[11] = axes.X*stationary.vertices[3].X+axes.Y*stationary.vertices[3].Y+axes.Z*stationary.vertices[3].Z;
cachemovable_[12] = axes.X*stationary.vertices[4].X+axes.Y*stationary.vertices[4].Y+axes.Z*stationary.vertices[4].Z;
cachemovable_[13] = axes.X*stationary.vertices[5].X+axes.Y*stationary.vertices[5].Y+axes.Z*stationary.vertices[5].Z;
cachemovable_[14] = axes.X*stationary.vertices[6].X+axes.Y*stationary.vertices[6].Y+axes.Z*stationary.vertices[6].Z;
cachemovable_[15] = axes.X*stationary.vertices[7].X+axes.Y*stationary.vertices[7].Y+axes.Z*stationary.vertices[7].Z;
maxPlayer = minPlayer = cachemovable_[0];
maxBlock = minBlock = cachemovable_[8];
if(cachemovable_[1]>maxPlayer){
maxPlayer = cachemovable_[1];
}
if(cachemovable_[1]<minPlayer){
minPlayer = cachemovable_[1];
}
if(cachemovable_[2]>maxPlayer){
maxPlayer = cachemovable_[2];
}
if(cachemovable_[2]<minPlayer){
minPlayer = cachemovable_[2];
}
if(cachemovable_[3]>maxPlayer){
maxPlayer = cachemovable_[3];
}
if(cachemovable_[3]<minPlayer){
minPlayer = cachemovable_[3];
}
if(cachemovable_[4]>maxPlayer){
maxPlayer = cachemovable_[4];
}
if(cachemovable_[4]<minPlayer){
minPlayer = cachemovable_[4];
}
if(cachemovable_[5]>maxPlayer){
maxPlayer = cachemovable_[5];
}
if(cachemovable_[5]<minPlayer){
minPlayer = cachemovable_[5];
}
if(cachemovable_[6]>maxPlayer){
maxPlayer = cachemovable_[6];
}
if(cachemovable_[6]<minPlayer){
minPlayer = cachemovable_[6];
}
if(cachemovable_[7]>maxPlayer){
maxPlayer = cachemovable_[7];
}
if(cachemovable_[7]<minPlayer){
minPlayer = cachemovable_[7];
}
if(cachemovable_[9]>maxBlock){
maxBlock = cachemovable_[9];
}
if(cachemovable_[9]<minBlock){
minBlock = cachemovable_[9];
}
if(cachemovable_[10]>maxBlock){
maxBlock = cachemovable_[10];
}
if(cachemovable_[10]<minBlock){
minBlock = cachemovable_[10];
}
if(cachemovable_[11]>maxBlock){
maxBlock = cachemovable_[11];
}
if(cachemovable_[11]<minBlock){
minBlock = cachemovable_[11];
}
if(cachemovable_[12]>maxBlock){
maxBlock = cachemovable_[12];
}
if(cachemovable_[12]<minBlock){
minBlock = cachemovable_[12];
}
if(cachemovable_[13]>maxBlock){
maxBlock = cachemovable_[13];
}
if(cachemovable_[13]<minBlock){
minBlock = cachemovable_[13];
}
if(cachemovable_[14]>maxBlock){
maxBlock = cachemovable_[14];
}
if(cachemovable_[14]<minBlock){
minBlock = cachemovable_[14];
}
if(cachemovable_[15]>maxBlock){
maxBlock = cachemovable_[15];
}
if(cachemovable_[15]<minBlock){
minBlock = cachemovable_[15];
}
seperated = minPlayer>maxBlock||maxPlayer<minBlock;
}
}
答案 0 :(得分:2)
减少调用该方法。这样可以为您提供更好的bigO可伸缩性增益,而不仅仅是让该方法更快一些。当分析器说方法很慢时,有两种方法可以解决它:使其更快或称之为更少。
如何?
假设您检查1000
个对象的冲突。我假设您当前的代码检查每个组合是否存在冲突,因此关于500000
组合(AB,AC,AD,...,BC,BD,...,CD,...),以便进行多次调用那个方法。
如果你事先知道哪些组合永远不会碰撞怎么办?在一维空间中,NavigableMap
(普通java)可以帮助您。在一个多维空间中,你需要像kd-map这样的东西(或者只是将它应用于一个已经很好的增益)。
例如,如果我们只查看1维,给定位置137.4
上的对象A(在该维度中),速度为20.3
,它可以在{{1}之间的任何位置结束}和117.1
。因此,让我们将最低的数字放在地图中:157.7
。现在,如果B在NavigableMap.put(117.1, A)
和50.4
之间的任何地方结束,我们可以问70.4
哪个不包含A(其密钥为navigableMap.floorMap(70.4, true)
)也没有任何其他元素的数字最低117.1
。
答案 1 :(得分:0)
我将假设您尽可能少地调用该方法 - 如果情况可能并非如此,请查看Geoffrey De Smet的答案,并减少您调用它的次数!如果您认为您尽可能少地呼叫它,请先检查他的答案,以防万一!
另外,我假设您没有使用strictfp;如果你是,不要。您不需要符合IEEE 754标准的浮点数。这通常不是默认值,但我想提一下以防万一。
以下几种方法可以降低不重复设计算法的浮点代码的成本(但如果可以做的话,其贡献远远小于算法重新设计):
首先,预先计算并保存尽可能多的计算,因为内存通常比CPU更容易获得;在许多情况下,甚至可以保存部分计算。
另外,使用float而不是double的配置文件。 Float需要移动更少的内存,并且由于Java倾向于在内存中构建类似于树的对象,这些对象分散得太多,而不是将所有内容放在一个小巧的软件包中的扁平对象,这些对象将适合CPU缓存行,您可以增加数量通过减小数据类型的大小来适应CPU缓存的数据。
尝试让代码在8个计算中自动向量化,因为现代CPU可以使用SSE / SSE2 / SSE3 / AVX扩展一次执行8到32个浮点计算,与正常情况相同使用旧的80x87兼容浮点架构进行浮点计算 - 如果您的Java JIT / JVM支持自动向量化,您可能需要重写代码,以便JIT / JVM识别它可以向量化;如果您的JIT / JVM不支持自动向量化,请考虑使用第三方库来公开块本机向量操作并在一次操作中计算所有点积,将它们加载到列表中,然后使用min / max查找限制。