是否有_mm_insert_epi32的SSE2等价物?

时间:2016-07-14 21:24:51

标签: sse

我正在移植一些代码,这大量使用了SSE4-intrinsics。它有一个非SSE实现,但我希望只有SSE2的CPU仍能使用更快的功能。

有人可以建议有效替换if (cmd.compareToIgnoreCase("end")== 0){ store.close(); System.exit(0); while (cmd.compareToIgnoreCase("end")!= 0) - 我想,我已经完成了其他所有事情......实际上,在我的情况下,函数的第二个和第三个参数是零:

while (cmd.compareToIgnoreCase("exit")!= 0){

    if (cmd.compareToIgnoreCase("new") == 0){
        //Ask user for ID 1-20, read ID
        try{
            id1 = JOptionPane.showInputDialog(null,"Enter ID(1-20):");
            recLocation= Integer.parseInt(id1);
            assert Integer.MAX_VALUE == PLAYER_ID;

            JOptionPane.showInputDialog(null, "The ID IS "+ id1);
        }
        catch (Exception e){
            JOptionPane.showInputDialog(null, "SORRY THIS IS NOT AN INTEGER PLEASE PRESS ENTER TO CONTINUE");

        }
        try{
            //Ask user for player name, read name
            id2 = JOptionPane.showInputDialog(null, "Enter a players name");
            assert id.length()== PLAYER_NAME;
            JOptionPane.showInputDialog("The players name is " + id2 + " press enter to continue");
            store.writeUTF(id2); 
        }
        catch (Exception e){
            JOptionPane.showInputDialog(null, "SORRY SOMETHING WENT WRONG PLEASE PRESS ENTER TO CONTINUE");

        }
        try{
            //ask for player team name, read team name
            id3 = JOptionPane.showInputDialog(null, "Enter a players team name");
            JOptionPane.showInputDialog("The players team name is " + id3 + ", press enter to continue");
            assert id.length()== TEAM_NAME;
            store.writeUTF(id3);
        }
        catch (Exception e){
            JOptionPane.showInputDialog(null, "SOMETHING WENT WRONG PLEASE PRESS ENTER TO CONTINUE");

        }

        //enter player skill level, read skill level(0-99)
        try{    
            id4 = JOptionPane.showInputDialog(null,"Enter a players skill level (0-99)");
            recLocation = Integer.parseInt(id4);
            JOptionPane.showInputDialog("The players skill level " + id4 + " press enter to continue");
        }
        catch (Exception e){
            JOptionPane.showInputDialog(null, "SORRY THIS IS NOT AN INTEGER PLEASE PRESS ENTER TO CONTINUE");

        }
        //enter player skill level, read skill level
        try{    
            id = JOptionPane.showInputDialog(null, "Enter todays Date");
            JOptionPane.showInputDialog("Today is " + id + " press enter to continue");
            assert id.length()== DRAFT_DATE;
            store.writeUTF(id);
        }
        catch (Exception e){
            JOptionPane.showInputDialog(null, "SORRY THIS IS NOT AN INTEGER PLEASE PRESS ENTER TO CONTINUE");
            continue;
        }
        //convert ID and skill level to string(char-5)


    }



    //if command is old, ask for ID and read Id, then use ID to retrieve record, display the record formatted for readability


    if (cmd.compareToIgnoreCase("old") == 0) {
        try{
            where = JOptionPane.showInputDialog(null, "Enter player:");
            recLocation = Integer.parseInt(where);
            store.seek((PLAYER_ID) * (recLocation-1));
            Description = store.readUTF();

            JOptionPane.showMessageDialog(null, Description);
        }
        catch(Exception e){
            JOptionPane.showInputDialog("Sorry there is no player try again");
            continue;
        }

    if (cmd.compareToIgnoreCase("exit")== 0){
            store.close();
            System.exit(0);
            }
        }
    }
}

1 个答案:

答案 0 :(得分:2)

所以你真的想要将向量的低元素归零?这对于_mm_insert_epi32来说是一个糟糕的用例。它在Intel CPU上是2 uops,其中一个需要shuffle端口。

在SSE4.1和SSE2版本中,请使用

foo = _mm_and_si128(vec, _mm_set_epi32(-1,-1,-1, 0));   // mask off the low element

或者,从归零向量中使用movss,但这可能会导致在两个整数指令之间使用FP shuffle的旁路延迟。 C intrinsics版本中存在令人讨厌的数量,因此它更容易被读作asm。

# vec in xmm0
pxor   xmm1, xmm1    ; _mm_setzero_si128()
movss  xmm0, xmm1    ; zero the low 32 bits of xmm0

2x _mm_insert_epi16几乎肯定不是最好的方法,即使你想用可变内容替换低元素以外的元素。这是一个2-uop指令,但在很多情况下,你可以用少于4 uop完成工作。

对于变量内容,最好使用_mm_cvtsi32_si128 (movd)并将两个向量混合在一起。解压缩指令很方便用于组合来自两个寄存器的数据,因此shufps(是的,您可以在整数数据上使用它)。

您也可以随机播放vec,因此要替换的元素是低元素,然后将其替换为movss(或AND / OR)。

对于一般情况,也许2x pinsrw并不可怕,但大多数特定情况应该让你想出更好的东西。有关更多资源的信息,请参阅http://agner.org/optimize/代码wiki,以帮助您编写有效的代码。