Question

Go中的rune类型为defined

int32的别名，在所有方面都相当于int32。它是按照惯例，用于区分字符值和整数值。

如果打算使用此类型来表示字符值，为什么Go语言的作者不使用uint32而不是int32？当它是负数时，他们如何期望在程序中处理rune值？其他类似的类型byte是uint8（而不是int8）的别名，这似乎是合理的。

Answer 1

我用Google搜索并发现了这个： https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg

已多次询问过这个问题。符文占用4个字节而不仅仅是一个，因为它应该存储unicode代码点而不仅仅是ASCII字符。与数组索引一样，数据类型也是有符号的，这样您就可以在对这些类型进行算术运算时轻松检测溢出或其他错误。

Answer 2

它不会变成负面的。 Unicode中目前有1,114,112个代码点，远远不是2,147,483,647（0x7fffffff） - 即使考虑了所有保留的块。

Answer 3

“Golang, Go : what is rune by the way?”提到：

使用最近的Unicode 6.3，定义了超过110,000个符号。这需要每个代码点至少21位表示，因此符文类似于int32并且具有大量位。

但是关于溢出或负值问题，请注意某些unicode函数（如unicode.IsGraphic）的实现包括：

我们转换为uint32以避免额外的负面测试

代码：

const MaxLatin1 = '\u00FF' // maximum Latin-1 value.

// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
    // We convert to uint32 to avoid the extra test for negative,
    // and in the index we convert to uint8 to avoid the range check.
    if uint32(r) <= MaxLatin1 {
        return properties[uint8(r)]&pg != 0
    }
    return In(r, GraphicRanges...)
}

这可能是因为符文应该是constant（如“Go rune type explanation”中所述，其中符文可能位于int32或uint32或甚至{ {1}}或...：常量值授权将其存储在numeric types}中的任何一个中。

Answer 4

允许使用负值这一事实使您可以定义自己的rune sentinel values。

例如：

const EOF rune = -1

func (l *lexer) next() (r rune) {
    if l.pos >= len(l.input) {
        l.width = 0
        return EOF
    }
    r, l.width = utf8.DecodeRuneInString(l.input[l.pos:])
    l.pos += l.width
    return r
}

在这里见过Rob Pike的演讲：Lexical Scanning in Go。

Answer 5

除了以上给出的答案外，这就是我为什么要加戈恩需要符文的两分钱。

GoLang中的字符串是字节数组，每个字符都表示为一个字节。因此，与其他语言相比，GoLang具有非常高的性能优势
但是由于我们需要一种表示无法用8位范围表示的UTF-8码点的方法，因此我们使用符文来表示它们。
为什么要使用int32，为什么不使用uint32？你可能会问。故意这样做是为了在对字符串进行操作时检测到溢出。

this文章详细介绍了所有这些内容

为什么golang中的符号是int32的别名而不是uint32？

5 个答案: