如何读取已声明的char字符串,该字符串是Unicode字符的十六进制2位数字值?

时间:2019-02-05 00:09:26

标签: c unicode hex byte

给出两个字符串,我必须读取它们的每个unicode值的十六进制2位数字值。忽略ASCII字符。

char * str1 = "⍺";
char * str2 = "alpha is ⍺, beta is β and mu is µ";

我尝试使用printf("<%02x>\n", str1);打印这些值,但似乎该值是错误的((unsigned char)也执行了该操作,而且似乎没有用)。

输出应该是这样的

<e2>
<e8><a2><2e>

这是我的完整代码:

#include <stdio.h>
#include <string.h>

char *str1 = "⍺";
char *str2 = "alpha is ⍺, beta is β and mu is µ";
char *str3 = "β";
char *str4 = "µ";

int main(){
    printf("<%x>\n", (unsigned char) * str1);
    printf("<%x>", (unsigned char) * str1);
    printf("<%x>", (unsigned char) * str3);
    printf("<%x>\n", (unsigned char) * str4);
}

1 个答案:

答案 0 :(得分:0)

此代码遍历字符串的字节,并标识“ ASCII”字符(Unicode U + 0000 .. U + 007F),通常不打印它们,对于U + 0080以上的Unicode字符,打印出<,代表该字符的一系列十六进制数字对,最后打印出一个>,最后是><,中间的#include <ctype.h> #include <stdbool.h> #include <stdio.h> static void dump_str(const char *s); static bool print_ascii = false; int main(int argc, char **argv) { const char *strings[] = { "⍺", "alpha is ⍺, beta is β and mu is µ", "At -37ºC, the £ and the € fall apart", "嬀£Åºüÿ", "⍺βµ", }; enum { NUM_STRINGS = sizeof(strings) / sizeof(strings[0]) }; // Use argv - my compilation options don't allow unused parameters to a function if (argc > 1 && argv[argc] == NULL) print_ascii = true; for (int i = 0; i < NUM_STRINGS; i++) dump_str(strings[i]); return 0; } static void dump_str(const char *s) { int c; bool printing_ascii = true; while ((c = (unsigned char)*s++) != '\0') { if (isascii(c)) { if (!printing_ascii) { printing_ascii = true; putchar('>'); } if (print_ascii) putchar(c); } else { if (printing_ascii) { printing_ascii = false; putchar('<'); } else { if ((c & 0xC0) != 0x80) { putchar('>'); putchar('<'); } } printf("%2x", c); } } if (!printing_ascii) putchar('>'); putchar('\n'); } 分隔了单独的UTF8编码的Unicode字符。如果您传入一个或多个参数,它也会打印“ ASCII”字符,但会以十六进制编码形式显示。

utf8-97

我调用了程序$ ./utf8-97 <e28dba> <e28dba><ceb2><c2b5> <c2ba><c2a3><c2a0><e282ac> <c3a5><c2ac><e282ac><c2a3><c385><c2ba><c3bc><c3bf> <e28dba><ceb2><c2b5> $ ./utf8-97 1 <e28dba> alpha is <e28dba>, beta is <ceb2> and mu is <c2b5> At -37<c2ba>C, the <c2a3><c2a0>and the <e282ac> fall apart <c3a5><c2ac><e282ac><c2a3><c385><c2ba><c3bc><c3bf> <e28dba><ceb2><c2b5> $ ;运行时,它给了我

<c2a0>

Error: cannot allocate vector of size 5.6 Mb 序列用于一个不间断的空格,我在磅符号£后面意外地在代码中放置了空格。如果您从答案中复制代码,我不确定是否能得到。