在字节数组中解析Unicode

时间:2016-12-20 17:55:13

标签: c++ arrays unicode

我有一个包含一系列字符的字节数组。在一个案例中我有

<link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/5.0.0/normalize.min.css" rel="stylesheet"/>

28到31的字符“name”,该部分以元素32结尾。然后我有另一个字节数组:

[28] = 0x6e
[29] = 0x61
[30] = 0x6d
[31] = 0x65
[32] = 0x00
[33] = 0x00
[34] = 0x00
[35] = 0x4f
[36] = 0x08
[37] = 0x00
[38] = 0x07
[39] = 0x00
[40] = 0x00
[41] = 0x04
[42] = 0x13
[43] = 0xff
[44] = 0xff
[45] = 0x00
[46] = 0x00

我认为我有字符串[47] = 0x01 [48] = 0x03 [49] = 0x00 [50] = 0x00 [51] = 0x73 [52] = 0x65 [53] = 0xc3 [54] = 0xb1 [55] = 0x6f [56] = 0x72 [57] = 0x00 [58] = 0x00 [59] = 0x00 [60] = 0x4f [61] = 0x08 [62] = 0x00 [63] = 0x08 [64] = 0x00 [65] = 0x00 [66] = 0x04 [67] = 0x13 [68] = 0xff [69] = 0xff [70] = 0x00 [71] = 0x00

使用第一个数组,很容易找到名称作为前4个字节,señor作为终结符但是如何解释第二个字节数组上的什么?

两个数组都是00 s。

1 个答案:

答案 0 :(得分:1)

该文字显然使用#!/bin/bash declare -A possible_alias_names=( ["y"]="yes" ) for item in "${!possible_alias_names[@]}"; do if ! hash ${item} 2> /dev/null; then if hash ${item} 2> /dev/null; then echo "always enter here because the command ${item} is not available" fi alias_cmd1="${item}" alias_cmd2="${possible_alias_names[$item]}" eval "alias ${alias_cmd1}='${alias_cmd2}'" #I also tried... eval "alias ${alias_cmd1}=\'${alias_cmd2}\'" if hash ${item} 2> /dev/null; then echo "You win!! alias worked. It means ${item} is available" fi fi done 编码:

UTF-8

这是UTF-8 encoded ñ字符。周围的字符是[53] = 0xc3 [54] = 0xb1 中的其余四个字符。

C ++库确实有working with UTF-8的一些功能;但我总觉得那些图书馆类有点笨拙和不灵活。在大多数平台上,您拥有一个出色,灵活的iconv library,其中包含一个简单易用的API,可用于在UTF-8和其他编码之间进行转换。