How to convert LPWSTR to char * with UTF-8 encoding

时间:2016-07-11 21:32:19

标签: c++ string qt unicode utf-8

I'm working on a cross-platform project using Qt. On Windows, I want to pass some Unicode characters (for instance, file path that contains Chinese characters) as arguments when launching the application from the command line. Then use these arguments to create a QCoreApplication.

For some reasons, I need to use CommandLineToArgvW to get the argument list like this:

LPWSTR * argvW = CommandLineToArgvW( GetCommandLineW(), &argc );

I understand on modern Windows OS, LPWSTR is actually wchar_t* which is 16bit and uses UTF-16 encoding.

While if I want to initialize the QCoreApplication, it only takes char* but not wchar_t*. QCoreApplication

So the question is: how can I safely convert the LPWSTR returned by CommandLineToArgvW() function to char* without losing the UNICODE encoding (i.e. the Chinese characters are still Chinese characters for example)?

I've tried many different ways without success:

1:

    std::string const argvString = boost::locale::conv::utf_to_utf<char>( argvW[0] )

2:

    int res;
    char buf[0x400];
    char* pbuf = buf;
    boost::shared_ptr<char[]> shared_pbuf;

    res = WideCharToMultiByte(CP_UTF8, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);

3: Convert to QString first, then convert to UTF-8.

ETID: Problem solved. The UTF-16 wide character to UTF-8 char conversion actually works fine without problem with all these three approaches. And in Visual Studio, in order to correctly view the UTF-8 string in debug, it's necessary to append the s8 format specifier after the watched variable name (see: https://msdn.microsoft.com/en-us/library/75w45ekt.aspx). This is the part that I overlooked and made me think that my string conversion was wrong.

The real issue here is actually when calling QCoreApplication.arguments(), the returned QString is constructed by QString::fromLocal8Bit(), which would cause encoding issues on Windows when the command line arguments contain unicode characters. The workaround is whenever necessary to retrieve the command line arguments on Windows, always call the Windows API CommandLineToArgvW(), and convert the 16-bit UTF-16 wchar_t * (or LPWSTR) to 8-bit UTF-8 char * (by one of the three ways mentioned above).

2 个答案:

答案 0 :(得分:2)

You should be able to use QString's functions. For example

QString str = QString::fromUtf16((const ushort*)argvW[0]);
::MessageBoxW(0, (const wchar_t*)str.utf16(), 0, 0);

When using WideCharToMultiByte, pass zero for output buffer and output buffer's length. This will tell you how many characters you need for output buffer. For example:

const wchar_t* wbuf = argvW[0];
int len = WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, 0, 0, 0, 0);

std::string buf(len, 0);

WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, &buf[0], len,0,0);
QString utf8;
utf8 = QString::fromUtf8(buf.c_str());
::MessageBoxW(0, (const wchar_t*)utf8.utf16(), 0, 0);

The same information should be available in QCoreApplication::arguments. For example, run this code with Unicode argument and see the output:

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    QString filename = QString::fromUtf8("ελληνική.txt");
    QFile fout(filename);
    if (fout.open(QIODevice::WriteOnly | QIODevice::Text))
    {
        QTextStream oss(&fout);
        oss.setCodec("UTF-8");
        oss << filename << "\n";
        QStringList list = a.arguments();
        for (int i = 0; i < list.count(); i++)
            oss << list[i] << "\n";
    }
    fout.close();
    return a.exec();
}

Note that in above example the filename is internally converted to UTF-16, that's done by Qt. WinAPI uses UTF-16, not UTF-8

答案 1 :(得分:2)

Qt内部包装int main(),在执行任何代码之前解压缩并解析Unicode命令行参数(通过CommandLineToArgvW)。生成的解析数据将通过等效char **argv转换为QString::toLocal8Bit()的本地UTF-8格式。

使用QCoreApplication::arguments()检索Unicode args。另外,来自文档的有用说明:

  

在Windows上,仅当修改后的argv / argc参数传递给构造函数时,才会根据argc和argv参数构建列表。在这种情况下,可能会出现编码问题。