我正在移植一些代码,这大量使用了SSE4-intrinsics。它有一个非SSE实现,但我希望只有SSE2的CPU仍能使用更快的功能。
有人可以建议有效替换if (cmd.compareToIgnoreCase("end")== 0){
store.close();
System.exit(0);
while (cmd.compareToIgnoreCase("end")!= 0)
- 我想,我已经完成了其他所有事情......实际上,在我的情况下,函数的第二个和第三个参数是零:
while (cmd.compareToIgnoreCase("exit")!= 0){
if (cmd.compareToIgnoreCase("new") == 0){
//Ask user for ID 1-20, read ID
try{
id1 = JOptionPane.showInputDialog(null,"Enter ID(1-20):");
recLocation= Integer.parseInt(id1);
assert Integer.MAX_VALUE == PLAYER_ID;
JOptionPane.showInputDialog(null, "The ID IS "+ id1);
}
catch (Exception e){
JOptionPane.showInputDialog(null, "SORRY THIS IS NOT AN INTEGER PLEASE PRESS ENTER TO CONTINUE");
}
try{
//Ask user for player name, read name
id2 = JOptionPane.showInputDialog(null, "Enter a players name");
assert id.length()== PLAYER_NAME;
JOptionPane.showInputDialog("The players name is " + id2 + " press enter to continue");
store.writeUTF(id2);
}
catch (Exception e){
JOptionPane.showInputDialog(null, "SORRY SOMETHING WENT WRONG PLEASE PRESS ENTER TO CONTINUE");
}
try{
//ask for player team name, read team name
id3 = JOptionPane.showInputDialog(null, "Enter a players team name");
JOptionPane.showInputDialog("The players team name is " + id3 + ", press enter to continue");
assert id.length()== TEAM_NAME;
store.writeUTF(id3);
}
catch (Exception e){
JOptionPane.showInputDialog(null, "SOMETHING WENT WRONG PLEASE PRESS ENTER TO CONTINUE");
}
//enter player skill level, read skill level(0-99)
try{
id4 = JOptionPane.showInputDialog(null,"Enter a players skill level (0-99)");
recLocation = Integer.parseInt(id4);
JOptionPane.showInputDialog("The players skill level " + id4 + " press enter to continue");
}
catch (Exception e){
JOptionPane.showInputDialog(null, "SORRY THIS IS NOT AN INTEGER PLEASE PRESS ENTER TO CONTINUE");
}
//enter player skill level, read skill level
try{
id = JOptionPane.showInputDialog(null, "Enter todays Date");
JOptionPane.showInputDialog("Today is " + id + " press enter to continue");
assert id.length()== DRAFT_DATE;
store.writeUTF(id);
}
catch (Exception e){
JOptionPane.showInputDialog(null, "SORRY THIS IS NOT AN INTEGER PLEASE PRESS ENTER TO CONTINUE");
continue;
}
//convert ID and skill level to string(char-5)
}
//if command is old, ask for ID and read Id, then use ID to retrieve record, display the record formatted for readability
if (cmd.compareToIgnoreCase("old") == 0) {
try{
where = JOptionPane.showInputDialog(null, "Enter player:");
recLocation = Integer.parseInt(where);
store.seek((PLAYER_ID) * (recLocation-1));
Description = store.readUTF();
JOptionPane.showMessageDialog(null, Description);
}
catch(Exception e){
JOptionPane.showInputDialog("Sorry there is no player try again");
continue;
}
if (cmd.compareToIgnoreCase("exit")== 0){
store.close();
System.exit(0);
}
}
}
}
答案 0 :(得分:2)
所以你真的想要将向量的低元素归零?这对于_mm_insert_epi32
来说是一个糟糕的用例。它在Intel CPU上是2 uops,其中一个需要shuffle端口。
在SSE4.1和SSE2版本中,请使用
foo = _mm_and_si128(vec, _mm_set_epi32(-1,-1,-1, 0)); // mask off the low element
或者,从归零向量中使用movss
,但这可能会导致在两个整数指令之间使用FP shuffle的旁路延迟。 C intrinsics版本中存在令人讨厌的数量,因此它更容易被读作asm。
# vec in xmm0
pxor xmm1, xmm1 ; _mm_setzero_si128()
movss xmm0, xmm1 ; zero the low 32 bits of xmm0
2x _mm_insert_epi16
几乎肯定不是最好的方法,即使你想用可变内容替换低元素以外的元素。这是一个2-uop指令,但在很多情况下,你可以用少于4 uop完成工作。
对于变量内容,最好使用_mm_cvtsi32_si128
(movd
)并将两个向量混合在一起。解压缩指令很方便用于组合来自两个寄存器的数据,因此shufps
(是的,您可以在整数数据上使用它)。
您也可以随机播放vec
,因此要替换的元素是低元素,然后将其替换为movss
(或AND / OR)。
对于一般情况,也许2x pinsrw
并不可怕,但大多数特定情况应该让你想出更好的东西。有关更多资源的信息,请参阅http://agner.org/optimize/和x86代码wiki,以帮助您编写有效的代码。