我尝试使用iconv
制作将s-jis字符串转换为utf-8字符串的方法。
我在下面写了一段代码,
#include <iconv.h>
#include <iostream>
#include <stdio.h>
using namespace std;
#define BUF_SIZE 1024
size_t z = (size_t) BUF_SIZE-1;
bool sjis2utf8( char* text_sjis, char* text_utf8 )
{
iconv_t ic;
ic = iconv_open("UTF8", "SJIS"); // sjis->utf8
iconv(ic , &text_sjis, &z, &text_utf8, &z);
iconv_close(ic);
return true;
}
int main(void)
{
char hello[BUF_SIZE] = "hello";
char bye[BUF_SIZE] = "bye";
char tmp[BUF_SIZE] = "something else";
sjis2utf8(hello, tmp);
cout << tmp << endl;
sjis2utf8(bye, tmp);
cout << tmp << endl;
}
,输出应为
hello
bye
但事实上,
hello
hello
有谁知道为什么会出现这种现象?我的计划出了什么问题?
请注意,“hello”和“bye”是我原始程序中的日语s-jis字符串,但我对其进行了修改以使程序易于查看。
答案 0 :(得分:3)
我认为你通过传递全局变量iconv
来滥用z
函数。第一次调用sjis2utf8
时,z
递减为0.对sjis2utf8
的第二次调用无效(z == 0)并保持tmp
不变。
来自iconv
documentation:
size_t iconv (iconv_t cd,
const char* * inbuf, size_t * inbytesleft,
char* * outbuf, size_t * outbytesleft);
iconv函数一次转换一个多字节字符,并且对于每个字符转换,它递增* inbuf并递减* inbytesleft转换的输入字节数,它递增* outbuf并递减* outbytesleft转换的输出字节数,它会更新cd中包含的转换状态。
您应该为缓冲区长度使用两个单独的变量:
size_t il = BUF_SIZE - 1 ;
size_t ol = BUF_SIZE - 1 ;
iconv(ic, &text_sjis, &il, &text_utf8, &ol) ;
然后检查iconv
的返回值和转换成功的缓冲区长度。
答案 1 :(得分:1)
#include <iconv.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;
const size_t BUF_SIZE=1024;
class IConv {
iconv_t ic_;
public:
IConv(const char* to, const char* from)
: ic_(iconv_open(to,from)) { }
~IConv() { iconv_close(ic_); }
bool convert(char* input, char* output, size_t& out_size) {
size_t inbufsize = strlen(input)+1;// s-jis string should be null terminated,
// if s-jis is not null terminated or it has
// multiple byte chars with null in them this
// will not work, or to provide in other way
// input buffer length....
return iconv(ic_, &input, &inbufsize, &output, &out_size);
}
};
int main(void)
{
char hello[BUF_SIZE] = "hello";
char bye[BUF_SIZE] = "bye";
char tmp[BUF_SIZE] = "something else";
IConv ic("UTF8","SJIS");
size_t outsize = BUF_SIZE;//you will need it
ic.convert(hello, tmp, outsize);
cout << tmp << endl;
outsize = BUF_SIZE;
ic.convert(bye, tmp, outsize);
cout << tmp << endl;
}
答案 2 :(得分:0)
您必须将条目长度字符串放在iconv
的第三个参数中。
尝试:
//...
int len = strlen(text_sjis);
iconv(ic , &text_sjis, &len, &text_utf8, &z);
//...
答案 3 :(得分:0)
size_t iconv (iconv_t cd,
const char* * inbuf, size_t * inbytesleft,
char* * outbuf, size_t * outbytesleft);
iconv
更改inbytesleft
指向的值。因此,在您第一次运行z
为0之后。要解决此问题,您应该使用计算inbuf
的长度并在每次转换之前将其存储在局部变量中。
此处描述:http://www.gnu.org/s/libiconv/documentation/libiconv/iconv.3.html
因为你把它标记为C ++我会建议把所有东西都装在一个漂亮的小班里,据我从文档中可以看出,你可以重复使用从inconv_t
获得的iconv_open
。根据您的喜好进行转换。
#include <iconv.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;
const size_t BUF_SIZE = 1024;
size_t z = (size_t) BUF_SIZE-1;
class IConv {
iconv_t ic_;
public:
IConv(const char* to, const char* from)
: ic_(iconv_open(to,from)) { }
~IConv() { iconv_close(ic_); }
bool convert(char* input, char* output, size_t outbufsize) {
size_t inbufsize = strlen(input);
return iconv(ic_, &input, &inbufsize, &output, &outbufsize);
}
};
int main(void)
{
char hello[BUF_SIZE] = "hello";
char bye[BUF_SIZE] = "bye";
char tmp[BUF_SIZE] = "something else";
IConv ic("UTF8","SJIS");
ic.convert(hello, tmp, BUF_SIZE);
cout << tmp << endl;
ic.convert(bye, tmp, BUF_SIZE);
cout << tmp << endl;
}