Question

似乎gcc对复杂的常量折叠有一些限制。这是一个例子：

static inline unsigned int DJBHash(const char *str)
{
   int i;
   unsigned int hash = 5381;

   for(i = 0; i < strlen(str); i++)
   {
      hash = ((hash << 5) + hash) + str[i];   
   }

   return hash;
}

int f(void)
{   
    return DJBHash("01234567890123456");
}

当使用-O3优化级别（gcc 4.8）运行时，它会很好地展开DJBHash中的循环并在编译期间计算该字符串的哈希值。

但是，当字符串长一个字符return DJBHash("012345678901234567");时，它不再折叠它，并生成带有条件跳转指令的循环。

我想将任意长度的文字字符串折叠为其哈希值作为编译时常量这可以吗？

澄清

我的问题是关于gcc的常量折叠优化（请参阅标题 - 请不要删除 gcc 和编译器标记）<登记/> 这里的许多答案试图用模板或constexpr解决问题。很高兴知道这些选项，并感谢发布它们为所有人的利益。但是，他们没有直接回答我的问题。

实际上，我正在使用gcc端口，因此如果需要，我可以更改和构建gcc源代码。但我只限于C，我想在这个范围内解决这个问题。

Answer 1

这是使用constexpr的版本。它在一个方面与其他方面略有不同 - 递归，最简单的方法是将字符串哈希回到前面，可以这么说。例如，它为＆＃34; abc＆＃34;提供的值。将是你通常期望的＆＃34; cba＆＃34;代替。我不认为这应该在使用中产生任何真正的不同，只要你一直使用其中一个（但考虑到散列的变化，我可能会错误）。

它确实在编译时进行评估 - 例如，我们可以将结果用作switch语句中的标签：

#include <iostream>

unsigned constexpr const_hash(char const *input) {
    return *input ?
           static_cast<unsigned>(*input) + 33 * const_hash(input + 1) :
           5381;
}

int main(int argc, char **argv) {
    switch (const_hash(argv[1])) {
    case const_hash("one"): std::cout << "one"; break;
    case const_hash("two"): std::cout << "two"; break;
    }
}

显然，可能存在冲突，因此您通常不会将其用作案例陈述标签 - 我大多数情况下这样做是为了强制一种情况，即如果结果不是，则无法编译。 ta编译时常量。

编辑：如果你关心哈希算法是＆＃34;正确＆＃34;，我想这更准确（感谢@Abyx）：

unsigned constexpr const_hash(char const *input, unsigned hash = 5381) {
    return *input ?
        const_hash(input + 1, hash * 33 + static_cast<unsigned>(*input)): 
        hash;
}

Answer 2

OP对C中的常量折叠感兴趣，但仅仅是因为它的C ++兄弟：在C ++ 14中，你可以简单地将constexpr放在两个函数的前面，并修改循环以补偿strlen()不是constexpr

#include<iostream>

static inline constexpr unsigned int DJBHash(const char *str)
{
   unsigned int hash = 5381;

   for(auto i = 0; i < 512; ++i) {
      if (*str == '\0') return hash;
      hash = ((hash << 5) + hash) + static_cast<unsigned int>(*str);   
   }

   return hash;
}

constexpr unsigned int f(void)
{   
    return DJBHash("01234567890123456");
}

int main()
{
    constexpr auto h = f(); 
    std::cout << std::hex << h << "\n"; // 88a7b505
}

Live Example使用Clang 3.4 SVN和-std=c++1y。

注意：当前的Clang实施无法与while(*str != '\0')正常运行。相反，有一个带有返回条件的有限循环512可以完成工作。

Answer 3

也许C ++ TMP可能会这样做。我不确定。

如果你不介意使用可变字符文字列表而不是字符串文字，这是可能的：

#include <type_traits>
#include <iostream>

template<unsigned acc, char... values>
struct DJBhash_helper
     : std::integral_constant<unsigned, acc> {};

template<unsigned acc, char head, char... tail>
struct DJBhash_helper<acc, head, tail...>
     : DJBhash_helper<(acc << 5) + acc + head, tail...> {};

template<char... str>
struct DJBhash
     : DJBhash_helper<5381, str...> {};

int main()
{
    std::cout << DJBhash<'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
                         '0', '1', '2', '3', '4', '5', '6', '7'>::value << '\n';
}

ideone live demo

Answer 4

不是答案，只是另一个数据点。

以下实施情况更糟。 GCC 4.7.3正确应用TCO将此实现转换为循环，但它仅在编译时计算为“0”！

static inline unsigned int DJBHash2(const char *str, unsigned int hash) {
   return *str ? DJBHash2(str + 1, 33 * hash + *str) : hash; }

在正面，递归版本缩短了7个字节。

有人提到clang，所以这里是clang 3.1 -O3的结果。它为两个版本的DJBHash生成不同的代码，但它们的字节数相同。有趣的是，它将转换和原始版本的转换转换为乘法。它将两个版本优化为常量，最多可包含100个字符的字符串。最后，clang代码比最短的GCC代码短5个字节。

gcc复数常数折叠

澄清

4 个答案: