如何从char *字符串中删除UFT8字符?

时间:2019-06-14 20:46:30

标签: c++ c++11 unicode utf-8

关于问题How to replace/ignore invalid Unicode/UTF8 characters � from C stdio.h getline()?,我为这个问题提供了一种可能的解决方案,但是我没有设法使其正常工作。

这是完整的示例:

FILE* cfilestream = fopen( "/filepath.txt", "r" );
int linebuffersize = 131072;
char* readline = (char*) malloc( linebuffersize );
char* fixedreadline = (char*) malloc( linebuffersize );

int index;
int charsread;
int invalidcharsoffset;

while( true )
{
    if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
    {
        invalidcharsoffset = 0;
        for( index = 0; index < charsread; ++index )
        {
            if( readline[index] != '�' ) {
                fixedreadline[index-invalidcharsoffset] = readline[index];
            } 
            else {
                ++invalidcharsoffset;
            }
        }
        std::cerr << "fixedreadline=" << fixedreadline << std::endl;
    }
    else {
        break;
    }
}

编译时,出现以下警告:

  $ x86_64-linux-gnu-gcc -g -O0 -Wall -ggdb -std=c++11 
  source/fastfile.cpp:512:44: warning: multi-character character constant [-Wmultichar]
                       if( readline[index] != '�' ) {
                                              ^~~~~

并且在运行程序时,它不会从输入字符串Føö�Bår中删除�字符。

0 个答案:

没有答案