基本上,我有8个数据,每个2位(4个状态),存储在32位整数的16个LSB中。我想反转数据片段的顺序以进行一些模式匹配。
给我一个参考整数和8个候选者,我需要将一个候选者与该参考进行匹配。但是,可以以某种可预测的方式对匹配的候选对象进行转换。
如果参考数据的格式为[0,1,2,3,4,5,6,7],则可能的匹配可以是以下8种格式之一:
[0,1,2,3,4,5,6,7], [0,7,6,5,4,3,2,1]
[6,7,0,1,2,3,4,5], [2,1,0,7,6,5,4,3]
[4,5,6,7,0,1,2,3], [4,3,2,1,0,7,6,5]
[2,3,4,5,6,7,0,1], [6,5,4,3,2,1,0,7]
模式是数据始终是有序的,但是可以反转和旋转。
我正在C和MIPS中实现它。我俩都在工作,但它们看起来很笨重。我目前的方法是将每片都遮盖起来,移至新位置,然后将其与新变量(已初始化为0)进行“或”运算。
我在C语言中做了更多的硬编码:
int ref = 4941; // reference value, original order [1,3,0,1,3,0,1,0], (encoded as 0b0001001101001101)
int rev = 0;
rev |= ((ref & 0x0003) << 14) | ((ref & 0x000C) << 10) | ((ref & 0x0030) << 6) | ((ref & 0x00C0) << 2); // move bottom 8 bits to top
rev |= ((ref & 0xC000) >> 14) | ((ref & 0x3000) >> 10) | ((ref & 0x0C00) >> 6) | ((ref & 0x0300) >> 2); // move top 8 bits to bottom
// rev = 29124 reversed order [0,1,0,3,1,0,3,1], (0b0111000111000100)
我在MIPS中实现了一个循环以尝试减少静态指令:
lw $01, Reference($00) # load reference value
addi $04, $00, 4 # initialize $04 as Loop counter
addi $05, $00, 14 # initialize $05 to hold shift value
addi $06, $00, 3 # initialize $06 to hold mask (one piece of data)
# Reverse the order of data in Reference and store it in $02
Loop: addi $04, $04, -1 # decrement Loop counter
and $03, $01, $06 # mask out one piece ($03 = Reference & $06)
sllv $03, $03, $05 # shift piece to new position ($03 <<= $05)
or $02, $02, $03 # put piece into $02 ($02 |= $03)
sllv $06, $06, $05 # shift mask for next piece
and $03, $01, $06 # mask out next piece (#03 = Reference & $06)
srlv $03, $03, $05 # shift piece to new position ($03 >>= $05)
or $02, $02, $03 # put new piece into $02 ($02 |= $03)
srlv $06, $06, $05 # shift mask back
addi $05, $05, -4 # decrease shift amount by 4
sll $06, $06, 2 # shift mask for next loop
bne $04, $00, Loop # keep looping while $04 != 0
有没有一种方法可以简化或至少减少指令?
答案 0 :(得分:0)
对于一种非常简单有效的方法,请使用256字节的查找表并执行2次查找:
extern unsigned char const xtable[256];
unsigned int ref = 4149;
unsigned int rev = (xtable[ref & 0xFF] << 8) | xtable[ref >> 8];
xtable
数组可以通过一组宏进行静态初始化:
#define S(x) ((((x) & 0x0003) << 14) | (((x) & 0x000C) << 10) | \
(((x) & 0x0030) << 6) | (((x) & 0x00C0) << 2) | \
(((x) & 0xC000) >> 14) | (((x) & 0x3000) >> 10) | \
(((x) & 0x0C00) >> 6) | (((x) & 0x0300) >> 2))
#define X8(m,n) m((n)+0), m((n)+1), m((n)+2), m((n)+3), \
m((n)+4), m((n)+5), m((n)+6), m((n)+7)
#define X32(m,n) X8(m,(n)), X8(m,(n)+8), X8(m,(n)+16), X8(m,(n)+24)
unsigned char const xtable[256] = {
X32(S, 0), X32(S, 32), X32(S, 64), X32(S, 96),
X32(S, 128), X32(S, 160), X32(S, 192), X32(S, 224),
};
#undef S
#undef X8
#undef X32
如果空间并不昂贵,则可以使用一次查找到128K字节的表,您可以在启动时进行计算,也可以在编写时使用脚本生成并在编译时进行包含,但这样做有些浪费并且对缓存不友好
答案 1 :(得分:0)
要反转位,可以使用以下代码。
static int rev(int v){
// swap adjacent pairs of bits
v = ((v >> 2) & 0x3333) | ((v & 0x3333) << 2);
// swap nibbles
v = ((v >> 4) & 0x0f0f) | ((v & 0x0f0f) << 4);
// swap bytes
v = ((v >> 8) & 0x00ff) | ((v & 0x00ff) << 8);
return v;
}
MIPS实现有15条指令。
rev: # value to reverse in $01
# uses $02 reg
srli $02, $01, 2
andi $02, $02, 0x3333
andi $01, $01, 0x3333
slli $01, $01, 2
or $01, $01, $02
srli $02, $01, 4
andi $02, $02, 0x0f0f
andi $01, $01, 0x0f0f
slli $01, $01, 4
or $01, $01, $02
srli $02, $01, 8
andi $02, $02, 0xff
andi $01, $01, 0xff
slli $01, $01, 8
or $01, $01, $02
# result in $01
请注意,只需将常量加倍(在64位计算机上甚至为4),即可同时反转2x16位。但是我不确定它是否对您有用。
答案 2 :(得分:0)
注意:请注意手写的优化程序集,如果您确实在紧密的循环中不满意编译器的生成,那么确实有针对特定处理器的优化会保留它们。
您可以改进pipeline,(如果使用C语言进行编码,则编译器会为您完成此操作),并使用bne
指令的延迟槽。这样可以改善您的instruction level parallelism。
假设您有一个Mips处理器,它具有1个延迟槽和5级流水线(指令获取,解码,执行,内存,回写)。
该流水线介绍了Read After Write对数据依赖的危害,大多数危害都在$3
寄存器上。
RaW突袭会导致您的管道停顿。
# Reverse the order of data in Reference and store it in $02
Loop: and $03, $01, $06 # mask out one piece ($03 = Reference & $06)
addi $04, $04, -1 # decrement Loop counter (RaW on $3)
sllv $03, $03, $05 # shift piece to new position ($03 <<= $05)
sllv $06, $06, $05 # shift mask for next piece
or $02, $02, $03 # put piece into $02 ($02 |= $03)
and $03, $01, $06 # mask out next piece (#03 = Reference & $06)
srlv $06, $06, $05 # shift mask back
srlv $03, $03, $05 # shift piece to new position ($03 >>= $05)
addi $05, $05, -4 # decrease shift amount by 4
or $02, $02, $03 # put new piece into $02 ($02 |= $03)
bne $04, $00, Loop # keep looping while $04 != 0
sll $06, $06, 2 # shift mask for next loop
如果您拥有Superscalar处理器,则该解决方案需要进行一些更改。