iconv只能使用一次

时间:2011-11-12 11:28:27

标签: c++ iconv

我尝试使用iconv制作将s-jis字符串转换为utf-8字符串的方法。 我在下面写了一段代码,

#include <iconv.h>
#include <iostream>
#include <stdio.h>
using namespace std;

#define BUF_SIZE 1024
size_t z = (size_t) BUF_SIZE-1;

bool sjis2utf8( char* text_sjis, char* text_utf8 )
{
  iconv_t ic;
  ic = iconv_open("UTF8", "SJIS"); // sjis->utf8
  iconv(ic , &text_sjis, &z, &text_utf8, &z);
  iconv_close(ic);
  return true;
}
int main(void)
{
  char hello[BUF_SIZE] = "hello";
  char bye[BUF_SIZE] = "bye";
  char tmp[BUF_SIZE] = "something else";

  sjis2utf8(hello, tmp);
  cout << tmp << endl;

  sjis2utf8(bye, tmp);
  cout << tmp << endl;
}

,输出应为

hello
bye

但事实上,

hello
hello

有谁知道为什么会出现这种现象?我的计划出了什么问题?

请注意,“hello”和“bye”是我原始程序中的日语s-jis字符串,但我对其进行了修改以使程序易于查看。

4 个答案:

答案 0 :(得分:3)

我认为你通过传递全局变量iconv来滥用z函数。第一次调用sjis2utf8时,z递减为0.对sjis2utf8的第二次调用无效(z == 0)并保持tmp不变。

来自iconv documentation

size_t iconv (iconv_t cd,
              const char* * inbuf, size_t * inbytesleft,
              char* * outbuf, size_t * outbytesleft);
  

iconv函数一次转换一个多字节字符,并且对于每个字符转换,它递增* inbuf并递减* inbytesleft转换的输入字节数,它递增* outbuf并递减* outbytesleft转换的输出字节数,它会更新cd中包含的转换状态。

您应该为缓冲区长度使用两个单独的变量:

size_t il = BUF_SIZE - 1 ;
size_t ol = BUF_SIZE - 1 ;

iconv(ic, &text_sjis, &il, &text_utf8, &ol) ;

然后检查iconv的返回值和转换成功的缓冲区长度。

答案 1 :(得分:1)

#include <iconv.h>
#include <iostream>
#include <stdio.h>
#include <string.h>

using namespace std;

const size_t BUF_SIZE=1024;


class IConv {
    iconv_t ic_;
public:
    IConv(const char* to, const char* from) 
        : ic_(iconv_open(to,from))    { }
    ~IConv() { iconv_close(ic_); }

     bool convert(char* input, char* output, size_t& out_size) {
        size_t inbufsize = strlen(input)+1;// s-jis string should be null terminated, 
                                           // if s-jis is not null terminated or it has
                                           // multiple byte chars with null in them this
                                           // will not work, or to provide in other way
                                           // input buffer length....
        return iconv(ic_, &input, &inbufsize, &output, &out_size);
     }
};

int main(void)
{
    char hello[BUF_SIZE] = "hello";
    char bye[BUF_SIZE] = "bye";
    char tmp[BUF_SIZE] = "something else";
    IConv ic("UTF8","SJIS");

    size_t outsize = BUF_SIZE;//you will need it
    ic.convert(hello, tmp, outsize);
    cout << tmp << endl;

    outsize = BUF_SIZE;
    ic.convert(bye, tmp, outsize);
    cout << tmp << endl;
}
  • 基于Kleist的回答

答案 2 :(得分:0)

您必须将条目长度字符串放在iconv的第三个参数中。

尝试:

//...
int len = strlen(text_sjis);
iconv(ic , &text_sjis, &len, &text_utf8, &z);
//...

答案 3 :(得分:0)

size_t iconv (iconv_t cd,
          const char* * inbuf, size_t * inbytesleft,
          char* * outbuf, size_t * outbytesleft);

iconv更改inbytesleft指向的值。因此,在您第一次运行z为0之后。要解决此问题,您应该使用计算inbuf的长度并在每次转换之前将其存储在局部变量中。

此处描述:http://www.gnu.org/s/libiconv/documentation/libiconv/iconv.3.html

因为你把它标记为C ++我会建议把所有东西都装在一个漂亮的小班里,据我从文档中可以看出,你可以重复使用从inconv_t获得的iconv_open。根据您的喜好进行转换。

#include <iconv.h>
#include <iostream>
#include <stdio.h>
#include <string.h>

using namespace std;

const size_t BUF_SIZE = 1024;
size_t z = (size_t) BUF_SIZE-1;

class IConv {
    iconv_t ic_;
public:
    IConv(const char* to, const char* from) 
        : ic_(iconv_open(to,from))    { }

    ~IConv() { iconv_close(ic_); }

    bool convert(char* input, char* output, size_t outbufsize) {
        size_t inbufsize = strlen(input);
        return iconv(ic_, &input, &inbufsize, &output, &outbufsize);
    }
};

int main(void)
{
    char hello[BUF_SIZE] = "hello";
    char bye[BUF_SIZE] = "bye";
    char tmp[BUF_SIZE] = "something else";
    IConv ic("UTF8","SJIS");


    ic.convert(hello, tmp, BUF_SIZE);
    cout << tmp << endl;

    ic.convert(bye, tmp, BUF_SIZE);
    cout << tmp << endl;
}