将C ++中的unicode字符串转换为大写字母

时间:2013-08-01 10:25:31

标签: c++ string unicode

我们如何在C或C ++中将多语言字符串或unicode字符串转换为大写/小写。

9 个答案:

答案 0 :(得分:11)

如果你的系统已经是UTF-8,使用std::use_facet,你可以写:

#include <iostream>
#include <locale.h>

int main() {
    std::locale::global(std::locale(""));  // (*)
    std::wcout.imbue(std::locale());
    auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());

    std::wstring str = L"Zoë Saldaña played in La maldición del padre Cardona.";

    f.toupper(&str[0], &str[0] + str.size());
    std::wcout << str << std::endl;

    return 0;
}

你得到(http://ideone.com/AFHoHC):

  

ZOËSALDAÑA在LAMALDICIÓNDELPADRE CARDONA演出。

如果它不起作用,您必须将(*)更改为std::locale::global(std::locale("en_US.UTF8"));或您在平台上实际拥有的UTF-8语言环境。

答案 1 :(得分:4)

我找到了2个问题的解决方案_

1。 setlocale(LC_CTYPE,“en_US.UTF-8”); //语言环境将是启用UTF-8的英语

std::wstring str = L"Zoë Saldaña played in La maldición del padre Cardona.ëèñ";

std::wcout << str << std::endl;

for (wstring::iterator it = str.begin(); it != str.end(); ++it)
    *it = towupper(*it);

std::wcout << "toUpper_onGCC_LLVM_1 :: "<< str << std::endl;

这适用于LLVM GCC 4.2编译器。

2。 的std ::区域设置::全局(标准::区域( “是en_US.UTF-8”)); //语言环境将是启用UTF-8的英语

std::wcout.imbue(std::locale());
const std::ctype<wchar_t>& f = std::use_facet< std::ctype<wchar_t> >(std::locale());

std::wstring str = L"Chloëè";//"Zoë Saldaña played in La maldición del padre Cardona.";

f.toupper(&str[0], &str[0] + str.size());   

std::wcout << str << std::endl;

这适用于Apple LLVM 4.2。

这两种情况我都是在Xocde上运行的。 但我找到了一种在Eclipse中使用g ++ Compiler运行此代码的方法。

答案 2 :(得分:3)

如果你要做得好的话会有很多困难。

通常的用例是出于比较的目的,但问题比这更普遍。

来自Matt Austern的大约2000年的C ++报告中有一篇相当详细的论文here(PDF)

答案 3 :(得分:2)

设置locale first,例如:

setlocale(LC_ALL, "German")); /*This won't work as per comments below */

setlocale(LC_ALL, "English"));

setlocale( LC_MONETARY, "French" );

setlocale( LC_ALL, "" ); //default locale 

然后使用

std::use_facet std::locale如下: -

typedef std::string::value_type char_t;
char_t upcase( char_t ch )
{
 return std::use_facet< std::ctype< char_t > >( std::locale() ).toupper( ch );
}

std::string toupper( const std::string &src )
{
 std::string result;
 std::transform( src.begin(), src.end(), std::back_inserter( result ), upcase );
 return result;
}

const std::string src  = "Hello World!";
std::cout << toupper( src );

答案 4 :(得分:2)

在Windows中,对于区域设置未知的混合语言应用程序,请考虑CharUpperBuffWCharLowerBuffW。这些函数处理toupper()没有的变音符号。

答案 5 :(得分:1)

您可以遍历wstring并使用towupper / towlower

for (wstring::iterator it = a.begin(); it != a.end(); ++it)
        *it = towupper(*it);

答案 6 :(得分:1)

如果您想要一个理智而成熟的解决方案,请查看IBM's ICU。这是一个例子:

#include <iostream>
#include <unicode/unistr.h>
#include <string>

int main(){
    icu::UnicodeString us("óóßChloë");
    us.toUpper(); //convert to uppercase in-place
    std::string s;
    us.toUTF8String(s);
    std::cout<<"Upper: "<<s<<"\n";

    us.toLower(); //convert to lowercase in-place
    s.clear();
    us.toUTF8String(s);
    std::cout<<"Lower: "<<s<<"\n";
    return 0;
}

输出:

Upper: ÓÓSSCHLOË
Lower: óósschloë

注意:在后面的步骤中,SS未被视为德语ß的首都

答案 7 :(得分:0)

对于C,我会在调整当前线程中的C语言环境后使用toupper

setlocale(LC_CTYPE, "en_US.UTF8");

对于C ++,我会使用toupper的{​​{1}}方法:

std::ctype<char>

答案 8 :(得分:0)

基于Kyle_the_hacker's -----> answer及其附加内容。

Ubuntu

在终端中 列出所有语言环境
locale -a

安装所有语言环境
sudo apt-get install -y locales locales-all

编译main.cpp
$ g++ main.cpp

运行编译的程序
$ ./a.out

结果

Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë

Ubuntu Linux - WSL from VSCODE

Ubuntu Linux - WSL

Windows

在cmd中运行VCVARS开发人员工具
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"

编译main.cpp
> cl /EHa main.cpp /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /std:c++17 /DYNAMICBASE "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MTd

Compilador de optimización de C/C++ de Microsoft (R) versión 19.27.29111 para x64
(C) Microsoft Corporation. Todos los derechos reservados.

main.cpp
Microsoft (R) Incremental Linker Version 14.27.29111.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib

运行main.exe
>main.exe

结果

Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë

Windows

代码-main.cpp

此代码仅在Windows x64和Ubuntu Linux x64上进行过测试。

/*
 * Filename: c:\Users\x\Cpp\main.cpp
 * Path: c:\Users\x\Cpp
 * Filename: /home/x/Cpp/main.cpp
 * Path: /home/x/Cpp
 * Created Date: Saturday, October 17th 2020, 10:43:31 pm
 * Author: Joma
 *
 * No Copyright 2020
 */

#include <iostream>
#include <locale>
#include <string>
#include <algorithm>
#include <set>
#include <cstdlib>
#include <clocale>

#if defined(_WIN32)
#define WINDOWSLIB 1
#define DLLCALL STDCALL
#define DLLIMPORT _declspec(dllimport)
#define DLLEXPORT _declspec(dllexport)
#define DLLPRIVATE

#define NOMINMAX
#include <Windows.h>
#include <objbase.h>
#include <filesystem>
#include <intrin.h>
#include <conio.h>

#elif defined(__ANDROID__) || defined(ANDROID) //Android
#define ANDROIDLIB 1
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

#elif defined(__APPLE__) //iOS, Mac OS
#define MACOSLIB 1
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

#elif defined(__LINUX__) || defined(__gnu_linux__) || defined(__linux__) || defined(__linux) || defined(linux) //_Ubuntu - Fedora - Centos - RedHat
#define LINUXLIB 1
#include <cpuid.h>
#include <experimental/filesystem>
#include <unistd.h>
#include <termios.h>
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#define CoTaskMemAlloc(p) malloc(p)
#define CoTaskMemFree(p) free(p)

#elif defined(__EMSCRIPTEN__)
#define EMSCRIPTENLIB 1
#include <unistd.h>
#include <termios.h>
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

#endif

typedef std::string String;
typedef std::wstring WString;
#define LINE_FEED_CHAR (static_cast<char>(10))

enum class ConsoleTextStyle
{
    DEFAULT = 0,
    BOLD = 1,
    FAINT = 2,
    ITALIC = 3,
    UNDERLINE = 4,
    SLOW_BLINK = 5,
    RAPID_BLINK = 6,
    REVERSE = 7,
};

enum class ConsoleForeground
{
    DEFAULT = 39,
    BLACK = 30,
    DARK_RED = 31,
    DARK_GREEN = 32,
    DARK_YELLOW = 33,
    DARK_BLUE = 34,
    DARK_MAGENTA = 35,
    DARK_CYAN = 36,
    GRAY = 37,
    DARK_GRAY = 90,
    RED = 91,
    GREEN = 92,
    YELLOW = 93,
    BLUE = 94,
    MAGENTA = 95,
    CYAN = 96,
    WHITE = 97
};

enum class ConsoleBackground
{
    DEFAULT = 49,
    BLACK = 40,
    DARK_RED = 41,
    DARK_GREEN = 42,
    DARK_YELLOW = 43,
    DARK_BLUE = 44,
    DARK_MAGENTA = 45,
    DARK_CYAN = 46,
    GRAY = 47,
    DARK_GRAY = 100,
    RED = 101,
    GREEN = 102,
    YELLOW = 103,
    BLUE = 104,
    MAGENTA = 105,
    CYAN = 106,
    WHITE = 107
};

class Console
{
public:
    static void Clear();
    static void WriteLine(const String &s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {});
    static void Write(const String &s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {});
    static void WriteLine(const WString &s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {});
    static void Write(const WString &s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {});
    static void WriteLine();
    static void Pause();
    static int PauseAny(bool printWhenPressed = false);

private:
    static void EnableVirtualTermimalProcessing();
    static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles);
    static void ResetTerminalFormat();
};

class Strings
{
public:
    static String WideStringToString(const WString &wstr);
    static WString StringToWideString(const String &str);
    static WString ToUpper(const WString &data);
    static String ToUpper(const String &data);
    static WString ToLower(const WString &data);
    static String ToLower(const String &data);
};

String Strings::WideStringToString(const WString &wstr)
{
    if (wstr.empty())
    {
        return String();
    }
    size_t pos;
    size_t begin = 0;
    String ret;
    size_t size;
#ifdef WINDOWSLIB
    pos = wstr.find(static_cast<wchar_t>(0), begin);
    while (pos != WString::npos && begin < wstr.length())
    {
        WString segment = WString(&wstr[begin], pos - begin);
        wcstombs_s(&size, nullptr, 0, &segment[0], _TRUNCATE);
        String converted = String(size, 0);
        wcstombs_s(&size, &converted[0], size, &segment[0], _TRUNCATE);
        ret.append(converted);
        begin = pos + 1;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
    }
    if (begin <= wstr.length())
    {
        WString segment = WString(&wstr[begin], wstr.length() - begin);
        wcstombs_s(&size, nullptr, 0, &segment[0], _TRUNCATE);
        String converted = String(size, 0);
        wcstombs_s(&size, &converted[0], size, &segment[0], _TRUNCATE);
        converted.resize(size - 1);
        ret.append(converted);
    }
#elif defined LINUXLIB
    pos = wstr.find(static_cast<wchar_t>(0), begin);
    while (pos != WString::npos && begin < wstr.length())
    {
        WString segment = WString(&wstr[begin], pos - begin);
        size = wcstombs(nullptr, segment.c_str(), 0);
        String converted = String(size, 0);
        wcstombs(&converted[0], segment.c_str(), converted.size());
        ret.append(converted);
        ret.append({0});
        begin = pos + 1;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
    }
    if (begin <= wstr.length())
    {
        WString segment = WString(&wstr[begin], wstr.length() - begin);
        size = wcstombs(nullptr, segment.c_str(), 0);
        String converted = String(size, 0);
        wcstombs(&converted[0], segment.c_str(), converted.size());
        ret.append(converted);
    }
#elif defined MACOSLIB
#endif

    return ret;
}

WString Strings::StringToWideString(const String &str)
{
    if (str.empty())
    {
        return WString();
    }

    size_t pos;
    size_t begin = 0;
    WString ret;
    size_t size;

#ifdef WINDOWSLIB
    pos = str.find(static_cast<char>(0), begin);
    while (pos != String::npos)
    {
        String segment = String(&str[begin], pos - begin);
        WString converted = WString(segment.size() + 1, 0);

        mbstowcs_s(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE);
        converted.resize(size - 1);
        ret.append(converted);
        ret.append({0});
        begin = pos + 1;
        pos = str.find(static_cast<char>(0), begin);
    }
    if (begin < str.length())
    {
        String segment = String(&str[begin], str.length() - begin);
        WString converted = WString(segment.size() + 1, 0);
        mbstowcs_s(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE);
        converted.resize(size - 1);
        ret.append(converted);
    }
#elif defined LINUXLIB
    pos = str.find(static_cast<char>(0), begin);
    while (pos != String::npos)
    {
        String segment = String(&str[begin], pos - begin);
        WString converted = WString(segment.size(), 0);
        size = mbstowcs(&converted[0], &segment[0], converted.size());
        converted.resize(size);
        ret.append(converted);
        ret.append({0});
        begin = pos + 1;
        pos = str.find(static_cast<char>(0), begin);
    }
    if (begin < str.length())
    {
        String segment = String(&str[begin], str.length() - begin);
        WString converted = WString(segment.size(), 0);
        size = mbstowcs(&converted[0], &segment[0], converted.size());
        converted.resize(size);
        ret.append(converted);
    }
#elif defined MACOSLIB
#endif

    return ret;
}

WString Strings::ToUpper(const WString &data)
{
    WString result = data;
    auto &f = std::use_facet<std::ctype<wchar_t>>(std::locale());

    f.toupper(&result[0], &result[0] + result.size());
    return result;
}

String Strings::ToUpper(const String &data)
{
    return WideStringToString(ToUpper(StringToWideString(data)));
}

WString Strings::ToLower(const WString &data)
{
    WString result = data;
    auto &f = std::use_facet<std::ctype<wchar_t>>(std::locale());
    f.tolower(&result[0], &result[0] + result.size());
    return result;
}

String Strings::ToLower(const String &data)
{
    return WideStringToString(ToLower(StringToWideString(data)));
}

void Console::Clear()
{

#ifdef WINDOWSLIB
    std::system(u8"cls");
#elif defined LINUXLIB
    std::system(u8"clear");
#elif defined EMSCRIPTENLIB
    emscripten::val::global()["console"].call<void>(u8"clear");
#elif defined MACOSLIB
#endif
}

void Console::Pause()
{
    char c;
    do
    {
        c = getchar();
    } while (c != LINE_FEED_CHAR);
}

int Console::PauseAny(bool printWhenPressed)
{
    int ch;
#ifdef WINDOWSLIB
    ch = _getch();
#elif defined LINUXLIB
    struct termios oldt, newt;
    tcgetattr(STDIN_FILENO, &oldt);
    newt = oldt;
    newt.c_lflag &= ~(ICANON | ECHO);
    tcsetattr(STDIN_FILENO, TCSANOW, &newt);
    ch = getchar();
    tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
#elif defined MACOSLIB
#endif
    return ch;
}

void Console::EnableVirtualTermimalProcessing()
{
#if defined WINDOWSLIB
    HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD dwMode = 0;
    GetConsoleMode(hOut, &dwMode);
    if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
    {
        dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
        SetConsoleMode(hOut, dwMode);
    }
#endif
}

void Console::ResetTerminalFormat()
{
    std::cout << u8"\033[0m";
}

void Console::SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
{
    String format = u8"\033[";
    format.append(std::to_string(static_cast<int>(foreground)));
    format.append(u8";");
    format.append(std::to_string(static_cast<int>(background)));
    if (styles.size() > 0)
    {
        for (auto it = styles.begin(); it != styles.end(); ++it)
        {
            format.append(u8";");
            format.append(std::to_string(static_cast<int>(*it)));
        }
    }
    format.append(u8"m");
    std::cout << format;
}

void Console::Write(const String &s, ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
{
    EnableVirtualTermimalProcessing();
    SetVirtualTerminalFormat(foreground, background, styles);
    String str = s;
#ifdef WINDOWSLIB
    WString unicode = Strings::StringToWideString(str);
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUXLIB
    std::cout << str;
#elif defined MACOSLIB
#endif
    ResetTerminalFormat();
}

void Console::WriteLine(const String &s, ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
{
    Write(s, foreground, background, styles);
    std::cout << std::endl;
}

void Console::Write(const WString &s, ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
{
    EnableVirtualTermimalProcessing();
    SetVirtualTerminalFormat(foreground, background, styles);
    WString str = s;

#ifdef WINDOWSLIB
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
#elif defined LINUXLIB
    std::cout << Strings::WideStringToString(str); //NEED TO BE FIXED. ADD locale parameter
#elif defined MACOSLIB
#endif
    ResetTerminalFormat();
}

void Console::WriteLine(const WString &s, ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
{
    Write(s, foreground, background, styles);
    std::cout << std::endl;
}

int main()
{
    std::locale::global(std::locale(u8"en_US.UTF-8"));
    String dataStr = u8"Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë";
    WString dataWStr = L"Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë";
    std::string locale = u8"";
    //std::string locale = u8"de_DE.UTF-8";
    //std::string locale = u8"en_US.UTF-8";
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    dataStr = Strings::ToUpper(dataStr);
    dataWStr = Strings::ToUpper(dataWStr);
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    dataStr = Strings::ToLower(dataStr);
    dataWStr = Strings::ToLower(dataWStr);
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    Console::PauseAny();
    return 0;
}