Question

我知道ICU和代码项目上的utf8小库（忘记确切的名称），但这些都不是我想要的。

我真正想要的是像ICU这样的东西，但是以更友好的方式结束。

具体做法是：

完全面向对象
c ++标准流的实现，或者至少执行相同角色的实现。
可以以区域设置相关的方式格式化时间，日期等（例如，英国的dd / mm / yy和美国的mm / dd / yy）。
让我选择字符串的“内部”编码，因此我可以让它在Windows上使用UTF-16，以避免在向Windows API和DirectX传递字符串时进行大量转换
在编码之间轻松转换字符串

如果不存在这样的库，是否可以使用标准c ++类包装ICU，所以我可以创建一个与std :: string和std :: wstring具有相同用法的ustring，并且还可以实现版本流（最好与它们完全兼容现有的，即我可以将它传递给期望std :: ostream的函数，它将在其内部格式和ascii（或utf-8）之间执行转换）？假设可能会有多少工作？

编辑：另外看过c ++ 0x标准并注意到utf8，utf16和utf32的文字，这是否意味着标准库（例如字符串，流等）将完全支持这些编码以及它们之间的转换？如果是这样，任何人都知道Visual Studio将支持这些功能需要多长时间？

EDIT2：至于使用现有的c ++支持，我会查找locale和facet。

我遇到的一个问题是，当使用围绕wchar_t定义的流（在Windows下为文件i / o时为2个字节）时，它仍然似乎使用ascii作为文件自己。

std::wofstream file(L"myfile.txt", std::ios::out);
file << L"Hello World!" << std::endl;

在文件中产生以下十六进制 48 65 6C 6C 6F 20 57 6F 72 6C 64 0D 0A
这显然是ascii而不是预期的utf-16输出：
FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00 0D 00 0A 00

Answer 1

我真正想要的是像ICU这样的东西，但是以更友好的方式包裹起来

不幸的是，没有这样的事情。他们的API并不是那么糟糕，所以你可以习惯它。

可以以区域设置相关的方式格式化时间，日期等（例如，英国的dd / mm / yy和美国的mm / dd / yy）。

在std::locale课程中有完整的支持，请阅读如何使用它。您还可以为std::iostream指定区域设置，以便正确格式化数字，日期。

在编码之间轻松转换字符串

std::locale提供了将8位本地编码转换为宽1和背后的方面。

所以我可以让它使用UTF-16

ICU在内部使用utf-16，win32 wchar_t和wstring也使用utf-16，在其他操作系统下，大多数实现都将wchar_t作为utf-32而wstring使用utf-32。

备注：对std::locale的支持并不完美，但它已经提供了许多对charrecter操作有用的工具。

请参阅：http://www.cplusplus.com/reference/std/locale/

Answer 2

这就是我使用ICU在std :: string（UTF-8）和std :: wstring

之间进行转换的方法

/** Converts a std::wstring into a std::string with UTF-8 encoding.
 */
template < typename StringT >
StringT utf8 ( std::wstring const & rc_string );

/** Converts a std::String with UTF-8 encoding into a std::wstring.
 */
template < typename StringT >
StringT utf8 ( std::string const & rc_string );

/** Nop specialization for std::string.
 */
template < >
inline std::string utf8 ( std::string const & rc_string )
{
  return rc_string;
}

/** Nop specialization for std::wstring.
 */
template < >
inline std::wstring utf8 ( std::wstring const & rc_string )
{
  return rc_string;
}

template < >
std::string utf8 ( std::wstring const & rc_string )
{
  std::string result;
  if(rc_string.empty())
    return result;

  std::vector<UChar> buffer;

  result.resize(rc_string.size() * 3); // UTF-8 uses max 3 bytes per char
  buffer.resize(rc_string.size() * 2); // UTF-16 uses max 2 bytes per char

  UErrorCode status = U_ZERO_ERROR;
  int32_t len = 0;

  u_strFromWCS(
    &buffer[0],
    buffer.size(),
    &len,
    &rc_string[0],
    rc_string.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strFromWCS failed");
  }
  buffer.resize(len);

  u_strToUTF8(
    &result[0],
    result.size(),
    &len,
    &buffer[0],
    buffer.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strToUTF8 failed");
  }
  result.resize(len);

  return result;
}/* end of utf8 ( ) */


template < >
std::wstring utf8 ( std::string const & rc_string )
{
  std::wstring result;
  if(rc_string.empty())
    return result;

  std::vector<UChar> buffer;

  result.resize(rc_string.size());
  buffer.resize(rc_string.size());

  UErrorCode status = U_ZERO_ERROR;
  int32_t len = 0;

  u_strFromUTF8(
    &buffer[0],
    buffer.size(),
    &len,
    &rc_string[0],
    rc_string.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strFromUTF8 failed");
  }
  buffer.resize(len);

  u_strToWCS(
    &result[0],
    result.size(),
    &len,
    &buffer[0],
    buffer.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strToWCS failed");
  }
  result.resize(len);

  return result;
}/* end of utf8 ( ) */

使用它就像那样简单：

std::string s = utf8<std::string>(std::wstring(L"some string"));
std::wstring s = utf8<std::wstring>(std::string("some string"));

Answer 3

格式化日期，时间等可以通过指定特定的区域设置来完成。至于滚动你自己 - 它总是可能的，从你需要的底层库中获取尽可能多的东西。

另外看过c ++ 0x标准并注意到utf8，utf16和utf32的文字，这是否意味着标准库（例如字符串，流等）将完全支持这些编码以及它们之间的转换？

是。但请注意，这些是不同的数据类型，而不是常规wchar序列或wstring。

如果是这样，任何人都知道Visual Studio将支持这些功能需要多长时间？

据我所知：vc9（VS2008）仅部分支持某些TR1功能。预计vc10（VS2010）将获得更好的支持。

Answer 4

我总是以这样的方式工作：

某些编码中的

字节流 - ＆gt; ICU - ＆gt; wistream - ＆gt; stl＆amp;提升 - ＆gt; wostream - ＆gt; ICU - ＆gt;某些编码中的字节流

Answer 5

我做了自己的小包装。如果你愿意，我可以分享。

Answer 6

运气好。我知道Dinkumware库提供了一些Unicode支持 - 您可以在他们的网站上查看文档。 AFAIK，它不是免费的。

C ++ unicode问题

6 个答案: