Golang从符文转换为字符串

时间:2016-08-31 09:22:06

标签: string parsing go unicode rune

我有以下代码,它应该将rune投射到string并打印出来。但是,我在打印时会得到未定义的字符。我无法弄清楚bug的位置:

package main

import (
    "fmt"
    "strconv"
    "strings"
    "text/scanner"
)

func main() {
    var b scanner.Scanner
    const a = `a`
    b.Init(strings.NewReader(a))
    c := b.Scan()
    fmt.Println(strconv.QuoteRune(c))
}

3 个答案:

答案 0 :(得分:14)

那是因为您使用Scanner.Scan()来阅读rune,但它还有其他功能。 Scanner.Scan()可用于读取由rune位掩码控制的特殊标记的标记Scanner.Mode,并从text/scanner返回特殊常量包,而不是阅读符文本身。

要阅读单rune使用Scanner.Next()而不是:

c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))

输出:

97 a 'a'

如果您只想将单个rune转换为string,请使用简单类型conversionruneint32的别名,并将整数转换为string

  

将有符号或无符号整数值转换为字符串类型会产生一个包含整数的UTF-8表示的字符串。

所以:

r := rune('a')
fmt.Println(r, string(r))

输出:

97 a

另外,为了遍历string值的符文,您只需使用for ... range构造:

for i, r := range "abc" {
    fmt.Printf("%d - %c (%v)\n", i, r, r)
}

输出:

0 - a (97)
1 - b (98)
2 - c (99)

或者您只需将string值转换为[]rune

即可
fmt.Println([]rune("abc")) // Output: [97 98 99]

还有utf8.DecodeRuneInString()

尝试Go Playground上的示例。

注意:

您的原始代码(使用Scanner.Scan())的工作方式如下:

  1. 您致电Scanner.Init(),将模式(b.Mode)设置为scanner.GoTokens
  2. 在输入上调用Scanner.Scan()(来自"a")会返回scanner.Ident,因为"a"是有效的Go标识符:

    c := b.Scan()
    if c == scanner.Ident {
        fmt.Println("Identifier:", b.TokenText())
    }
    
    // Output: "Identifier: a"
    

答案 1 :(得分:1)

我知道我在派对上有点迟了但是这里有一个[]符文字符串函数:

func runesToString(runes []rune) (outString string) {
    // don't need index so _
    for _, v := range runes {
        outString += string(v)
    }
    return
}

是的,有一个命名的返回但我认为在这种情况下它没问题,因为它减少了行数并且功能很短

答案 2 :(得分:1)

自从我来到这个问题寻找符文和字符串和字符,认为这可能会帮助像我这样的新手

// str := "aഐbc"
// testString(str)
func testString(oneString string){

    //string to byte slice - No sweat -just type cast it
    // As string  IS A byte slice
    var twoByteArr []byte = []byte(oneString)

    // string to rune Slices - No sweat 
    // string IS A slice of runes 
    var threeRuneSlice []rune = []rune(oneString)

   // Hmm! String seems to have a dual personality it is both a slice of bytes and
   // a slice of runes - yeah - read on
    
    // A rune slice can be convered to string -
    // No sweat - as string == rune slice
    var thrirdString string = string(threeRuneSlice)
    
    // There is a catch here and that is in printing "characters", using for loop and range 
    
    fmt.Println("Chars in oneString")
    for i,r := range oneString {
        fmt.Printf(" %d  %v  %c ",i,r,r) //you may not get index 0,1,2,3 here  
        // since the range runs specially over strings  https://blog.golang.org/strings
    }
    
    fmt.Println("\nChars in threeRuneSlice")
    for i,r := range threeRuneSlice {
        fmt.Printf(" %d  %v  %c ",i,r,r) // i = 0,1,2,4 , perfect!!
        // as runes are made up of 4 bytes (rune is int32 and byte in unint8
        // and a set of bytes is used to represent a rune which is used to 
       // represent  UTF characters == the REAL CHARECTER 
    }

    fmt.Println("\nValues in oneString ")
    for j := 0; j < len(oneString); j++ {
        fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
        // as you are going over bytes here - not runes
    }
    fmt.Println("\nValues in twoByteArr")
    for j := 0; j < len(twoByteArr); j++ {
        fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
    }
    
    fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}

还有一些更无意义的演示https://play.golang.org/p/tagRBVG8k7V 改编自https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ

显示“字符”根据 unicode 代码点使用 1 到 4 个字节进行编码