iconv库不正确地将UTF-8转换为KOI8-R

时间:2018-01-15 04:02:12

标签: c character-encoding iconv

我正在尝试使用GNU iconv库将UTF-8编码的字符串转换为KOI8-R。我的最小例子是

#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    /* The letter П in UTF-8. */
    char* buffer = "\xd0\x9f";
    size_t len = 2;
    /* Note: since KOI8-R is an 8-bit encoding, the buffer should only need a length of 1, but
     * iconv returns -1 if the buffer is any smaller than 4 bytes,
     */
    size_t len_in_koi = 4;
    char* buffer_in_koi = malloc(len_in_koi+1);
    /* A throwaway copy to give to iconv. */
    char* buffer_in_koi_copy = buffer_in_koi;
    iconv_t cd = iconv_open("UTF-8", "KOI8-R");
    if (cd == (iconv_t) -1) {
        fputs("Error while initializing iconv_t handle.\n", stderr);
        return 2;
    }
    if (iconv(cd, &buffer, &len, &buffer_in_koi_copy, &len_in_koi) != (size_t) -1) {
        /* Expecting f0 but get d0. */
        printf("Conversion successful! The byte is %x.\n", (unsigned char)(*buffer_in_koi));
    } else {
        fputs("Error while converting buffer to KOI8-R.\n", stderr);
        return 3;
    }
    iconv_close(cd);
    free(buffer_in_koi);
    return 0;
}

(除了我的KOI8-R缓冲区小于4个字节时不工作,虽然它只需要一个字节)不正确地打印d0(KOI8-R中'П'的正确编码是f0)。

iconv从命令行给出了正确答案(例如echo П | iconv -t KOI8-R | hexdump),那么在使用C接口时我做错了什么?

1 个答案:

答案 0 :(得分:4)

你把&#34;混合到了#34; &#34;来自&#34; iconv_open的字符集参数。恰好,KOI8-R中的插槽D0中的字符具有D0作为其UTF-8编码的第一个字节。