我正在尝试优化用C编写的Kasumi crypto
算法。
有用于加密数据的S-box。我代表的是一个庞大的数组:
int S7[128] = {
54, 50, 62, 56, 22, 34, 94, 96, 38, 6, 63, 93, 2, 18,123, 33,
55,113, 39,114, 21, 67, 65, 12, 47, 73, 46, 27, 25,111,124, 81,
53, 9,121, 79, 52, 60, 58, 48,101,127, 40,120,104, 70, 71, 43,
20,122, 72, 61, 23,109, 13,100, 77, 1, 16, 7, 82, 10,105, 98,
117,116, 76, 11, 89,106, 0,125,118, 99, 86, 69, 30, 57,126, 87,
112, 51, 17, 5, 95, 14, 90, 84, 91, 8, 35,103, 32, 97, 28, 66,
102, 31, 26, 45, 75, 4, 85, 92, 37, 74, 80, 49, 68, 29,115, 44,
64,107,108, 24,110, 83, 36, 78, 42, 19, 15, 41, 88,119, 59, 3
};
int S9[512] = {
167,239,161,379,391,334, 9,338, 38,226, 48,358,452,385, 90,397,
183,253,147,331,415,340, 51,362,306,500,262, 82,216,159,356,177,
175,241,489, 37,206, 17, 0,333, 44,254,378, 58,143,220, 81,400,
95, 3,315,245, 54,235,218,405,472,264,172,494,371,290,399, 76,
165,197,395,121,257,480,423,212,240, 28,462,176,406,507,288,223,
501,407,249,265, 89,186,221,428,164, 74,440,196,458,421,350,163,
232,158,134,354, 13,250,491,142,191, 69,193,425,152,227,366,135,
344,300,276,242,437,320,113,278, 11,243, 87,317, 36, 93,496, 27,
487,446,482, 41, 68,156,457,131,326,403,339, 20, 39,115,442,124,
475,384,508, 53,112,170,479,151,126,169, 73,268,279,321,168,364,
363,292, 46,499,393,327,324, 24,456,267,157,460,488,426,309,229,
439,506,208,271,349,401,434,236, 16,209,359, 52, 56,120,199,277,
465,416,252,287,246, 6, 83,305,420,345,153,502, 65, 61,244,282,
173,222,418, 67,386,368,261,101,476,291,195,430, 49, 79,166,330,
280,383,373,128,382,408,155,495,367,388,274,107,459,417, 62,454,
132,225,203,316,234, 14,301, 91,503,286,424,211,347,307,140,374,
35,103,125,427, 19,214,453,146,498,314,444,230,256,329,198,285,
50,116, 78,410, 10,205,510,171,231, 45,139,467, 29, 86,505, 32,
72, 26,342,150,313,490,431,238,411,325,149,473, 40,119,174,355,
185,233,389, 71,448,273,372, 55,110,178,322, 12,469,392,369,190,
1,109,375,137,181, 88, 75,308,260,484, 98,272,370,275,412,111,
336,318, 4,504,492,259,304, 77,337,435, 21,357,303,332,483, 18,
47, 85, 25,497,474,289,100,269,296,478,270,106, 31,104,433, 84,
414,486,394, 96, 99,154,511,148,413,361,409,255,162,215,302,201,
266,351,343,144,441,365,108,298,251, 34,182,509,138,210,335,133,
311,352,328,141,396,346,123,319,450,281,429,228,443,481, 92,404,
485,422,248,297, 23,213,130,466, 22,217,283, 70,294,360,419,127,
312,377, 7,468,194, 2,117,295,463,258,224,447,247,187, 80,398,
284,353,105,390,299,471,470,184, 57,200,348, 63,204,188, 33,451,
97, 30,310,219, 94,160,129,493, 64,179,263,102,189,207,114,402,
438,477,387,122,192, 42,381, 5,145,118,180,449,293,323,136,380,
43, 66, 60,455,341,445,202,432, 8,237, 15,376,436,464, 59,461
};
在加密过程中,我们经常访问此阵列。 我将这个数组从头文件移动到本地函数的一个优化,以便不会发生一些缓存未命中。
有任何建议可以通过将此数组更改为任何其他数据结构来更优化吗?
答案 0 :(得分:2)
那个数组并不大。一个典型的L1缓存至少是10的kB(这就是苹果的总存储量ii)。并且将数组从标题移动到函数不会改变缓存局部性。
以适当的形式(如在注释中)存储它可能是有意义的(它将适合l1缓存,但如果你有其他数据,可能由另一个线程使用,它有更多机会留在那里) - 没有每个值需要超过2个字节(但我不知道与使用本机大小的int相比是否会带来额外的成本)。
如果这非常重要,您应该查看生成的代码并对其进行优化。
答案 1 :(得分:2)
首先,确保将这些数组声明为const
,以便编译器知道它们永远不会改变。
其次,正如Oli Charlesworth在评论中所说,你并不需要一个完整的int
来存储每个值。 S7
和S9
数组的元素是7位和9位无符号整数,因此int8_t
或uint8_t
中的任何一个都应该足够S7
,以及int16_t
的{{1}}或uint16_t
。 (您可能想要确定使用有符号或无符号类型之间是否有任何区别,尽管我不会真正期望任何类型。)
最后,如果确实希望完全摆脱数组,那么也可以在没有任何查找表的情况下直接实现KASUMI S-box ,使用位操作(特别是AND和XOR)。有关详细信息,请参阅KASUMI specification的第13-16页。但是,我强烈怀疑这对软件实现没有用,除非您使用bit-slicing并行加密多个块。