如何在Windows和Unix-land中使Unicode iostream i / o工作?

时间:2015-05-12 17:31:53

标签: c++ windows unicode console iostream

注意:这是 question-with-answer ,以便记录其他人可能认为有用的技术,以便了解其他人更好的解决方案。请随意添加批评或问题作为评论。也可以随意添加其他答案。 :)

问题#1:

  • 在Windows API级别严格限制通过流对Unicode 的控制台支持。普通桌面应用程序唯一可用的相关代码页是65001,UTF-8。然后交互式输入在API级别失败,甚至非ASCII字符的输出也失败 - 并且C ++标准库实现不提供此问题的解决方法。
 
#include <iostream>
#include <string>
using namespace std;

auto main() -> int
{
    wstring username;
    wcout << L"Hi, what’s your name? ";
    getline( wcin, username );
    wcout << "Pleased to meet you, " << username << "!\n";
}
H:\personal\web\blog alf on programming at wordpress\002\code>chcp 65001
Active code page: 65001

H:\personal\web\blog alf on programming at wordpress\002\code>g++ problem.input.cpp -std=c++14

H:\personal\web\blog alf on programming at wordpress\002\code>a
Hi, whatSøren Moskégård
                             ← No visible output.
H:\personal\web\blog alf on programming at wordpress\002\code>_

在Windows API级别,解决方案是在相关标准流绑定到控制台时使用非基于流的直接控制台i / o 。例如,使用WriteConsole API函数。作为Visual C ++和MinGW g ++标准库支持的扩展,可以为使用WriteConsole的标准宽流设置模式,并且还有一种模式用于转换为UTF-8或从UTF-8转换为外部编码。

在Unix-land中,单个调用setlocale( LC_ALL, "" )或其更高级别的C ++等价物就足以使宽流工作。

但是如何透明地设置这些模式呢?自动,以便使用宽流的相同普通标准C ++代码在Windows和Unix-land中都能正常工作?

注意到,对于那些在Unix-land程序中使用宽文本而感到不寒而栗的读者来说,这对于使用UTF-8窄文本的可移植代码实际上是必备在Unix-land中控制台i / o。也就是说,在Windows中自动使用Unix-land和宽文本中的UTF-8窄文本的代码变得可能,并且可以建立在W​​indows中对Unicode的支持之上。但是没有这样的支持,一般情况下都没有可移植性。

问题#2:

  • 使用宽流,输出项的默认转换为wchar_t const*不起作用。
 
#include <iostream>
using namespace std;

struct Byte_string
{ operator char const* () const { return "Hurray, it works!"; } };

struct Wide_string
{ operator wchar_t const* () const { return L"Hurray, it works!"; } };

auto main() -> int
{
    wcout << "Byte string pointer: " << Byte_string() << endl;
    wcout << "Wide string pointer: " << Wide_string() << endl;
}
Byte string pointer: Hurray, it works!
Wide string pointer: 0x4ad018

这是标准中实现级别的不一致类型的缺陷,我早就报告过了。我不确定状态,它可能已被遗忘(我从来没有收到任何关于它的邮件),或者可能会在C ++ 17中应用修复程序。无论如何,如何解决这个问题?

简而言之,如何制作使用Unicode宽文本控制台i / o的标准C ++代码,在Windows和Unix版本中工作并实用?

1 个答案:

答案 0 :(得分:8)

修复转换问题:

CPPX / STDLIB / iostreams_conversion_defect.fix.hpp
#pragma once
//----------------------------------------------------------------------------------------
//    PROBLEM DESCRIPTION.
//
//    Output of wchar_t const* is only supported via an operator<< template. User-defined
//    conversions are not considered for template matching. This results in actual argument
//    with user conversion to wchar_t const*, for a wide stream, being presented as the
//    pointer value instead of the string.

#include <iostream>

#ifndef CPPX_NO_IOSTREAM_CONVERSION_FIX
    namespace std{
        template< class Char_traits >
        inline auto operator<<(
            basic_ostream<wchar_t, Char_traits>&    stream,
            wchar_t const                           ch
            )
            -> basic_ostream<wchar_t, Char_traits>&
        { return operator<< <wchar_t, Char_traits>( stream, ch ); }

        template< class Char_traits >
        inline auto operator<<(
            basic_ostream<wchar_t, Char_traits>&    stream,
            wchar_t const* const                    s
            )
            -> basic_ostream<wchar_t, Char_traits>&
        { return operator<< <wchar_t, Char_traits>( stream, s ); }
    }  // namespace std
#endif

在Windows中设置直接i / o模式:

这是Visual C ++和MinGW g ++都支持的标准库扩展。

首先,仅仅因为它在代码中的使用,Ptr类型构建器的定义(库提供的类型构建器的主要缺点是普通类型推断不起作用,即,在某些情况下必要仍然使用原始运算符表示法):

CPPX / core_language / type_builders.hpp
⋮
    template< class T >         using Ptr           = T*;
⋮

帮助程序定义,因为它在多个文件中使用:

CPPX / STDLIB / Iostream_mode.hpp
#pragma once
// Mode for a possibly console-attached iostream, such as std::wcout.

namespace cppx {
    enum Iostream_mode: int { unknown, utf_8, direct_io };
}  // namespace cppx

模式设定者(基本功能):

CPPX / STDLIB / IMPL / utf8_mode.for_windows.hpp
#pragma once
// UTF-8 mode for a stream in Windows.
#ifndef _WIN32
#   error This is a Windows only implementation.
#endif

#include <cppx/stdlib/Iostream_mode.hpp>

#include <stdio.h>      // FILE, stdin, stdout, stderr, etc.

// Non-standard headers, which are de facto standard in Windows:
#include <io.h>         // _setmode, _isatty, _fileno etc.
#include <fcntl.h>      // _O_WTEXT etc.

namespace cppx {

    inline
    auto set_utf8_mode( const Ptr< FILE > f )
        -> Iostream_mode
    {
        const int file_number = _fileno( f );       // See docs for error handling.
        if( file_number == -1 ) { return Iostream_mode::unknown; }
        const int new_mode = (_isatty( file_number )? _O_WTEXT : _O_U8TEXT);
        const int previous_mode = _setmode( file_number, new_mode );
        return (0?Iostream_mode()
            : previous_mode == -1?      Iostream_mode::unknown
            : new_mode == _O_WTEXT?     Iostream_mode::direct_io
            :                           Iostream_mode::utf_8
            );
    }

}  // namespace cppx
CPPX / STDLIB / IMPL / utf8_mode.generic.hpp
#pragma once
#include <stdio.h>      // FILE, stdin, stdout, stderr, etc.
#include <cppx/core_language/type_builders.hpp>     // cppx::Ptr

namespace cppx {

    inline
    auto set_utf8_mode( const Ptr< FILE > )
        -> Iostream_mode
    { return Iostream_mode::unknown; }

}  // namespace cppx
CPPX / STDLIB / utf8_mode.hpp
#pragma once
// UTF-8 mode for a stream. For Unix-land this is a no-op & the locale must be UTF-8.

#include <cppx/core_language/type_builders.hpp>     // cppx::Ptr
#include <cppx/stdlib/Iostream_mode.hpp>

namespace cppx {
    inline
    auto set_utf8_mode( const Ptr< FILE > ) -> Iostream_mode;
}  // namespace cppx

#ifdef _WIN32   // This also covers 64-bit Windows.
#   include "impl/utf8_mode.for_windows.hpp"    // Using Windows-specific _setmode.
#else
#   include "impl/utf8_mode.generic.hpp"        // A do-nothing implementation.
#endif

配置标准流。

除了在Windows中适当地设置直接控制台I / O模式或UTF-8之外,这还修复了隐式转换缺陷; (间接)调用setlocale以便广泛的流在Unix-land中工作;将boolalpha设为好的衡量标准,作为更合理的默认值;并包含与iostreams相关的所有标准库标题(我不会显示单独的头文件,并且在某种程度上个人偏好包含多少内容或是否完全包含此内容):< / p> CPPX / STDLIB / iostreams.hpp

#pragma once
// Standard iostreams but configured to work, plus, as utility, with boolalpha set.

#include <raw_stdlib/iostreams.hpp>         // <iostream>, <sstream>, <fstream> etc. for convenience.

#include <cppx/core_language/type_builders.hpp>     // cppx::Ptr
#include <cppx/stdlib/utf8_mode.hpp>        // stdin etc., stdlib::set_utf8_mode
#include <locale>                           // std::locale
#include <string>                           // std::string

#include <cppx/stdlib/impl/iostreams_conversion_defect.fix.hpp> // Support arg conv.

inline auto operator<< ( std::wostream& stream, const std::string& s )
    -> std::wostream&
{ return (stream << s.c_str()); }

// The following code's sole purpose is to automatically initialize the streams.
namespace cppx { namespace utf8_iostreams {
    using std::locale;
    using std::ostream;
    using std::cin; using std::cout; using std::cerr; using std::clog;
    using std::wostream;
    using std::wcin; using std::wcout; using std::wcerr; using std::wclog;
    using std::boolalpha;

    namespace detail {
        using std::wstreambuf;

        // Based on "Filtering streambufs" code by James Kanze published at
        // <url: http://gabisoft.free.fr/articles/fltrsbf1.html>.
        class Correcting_input_buffer
            : public wstreambuf
        {
        private:
            wstreambuf*     provider_;
            wchar_t         buffer_;

        protected:
            auto underflow()
                -> int_type override
            {
                if( gptr() < egptr() )  { return *gptr(); }

                const int_type result = provider_->sbumpc();
                if( result == L'\n' )
                {
                    // Ad hoc workaround for g++ extra newline undesirable behavior:
                    provider_->pubsync();
                }

                if( traits_type::not_eof( result ) )
                {
                    buffer_ = result;
                    setg( &buffer_, &buffer_, &buffer_ + 1 );
                }
                return result ;
            }

        public:
            Correcting_input_buffer( wstreambuf* a_provider )
                : provider_( a_provider )
            {}
        };
    }  // namespace detail

    class Usage
    {
    private:
        static
        void init_once()
        {
            // In Windows there is no UTF-8 encoding spec for the locale, in Unix-land
            // it's the default. From Microsoft's documentation: "If you provide a code
            // page like UTF-7 or UTF-8, setlocale will fail, returning NULL". Still
            // this call is essential for making the wide streams work correctly in
            // Unix-land.
            locale::global( locale( "" ) ); // Effects a `setlocale( LC_ALL, "" )`.

            for( const Ptr<FILE> c_stream : {stdin, stdout, stderr} )
            {
                const auto new_mode = set_utf8_mode( c_stream );
                if( c_stream == stdin && new_mode == Iostream_mode::direct_io )
                {
                    static detail::Correcting_input_buffer  correcting_buffer( wcin.rdbuf() );
                    wcin.rdbuf( &correcting_buffer );
                }
            }

            for( const Ptr<ostream> stream_ptr : {&cout, &cerr, &clog} )
            {
                *stream_ptr << boolalpha;
            }

            for( const Ptr<wostream> stream_ptr : {&wcout, &wcerr, &wclog} )
            {
                *stream_ptr << boolalpha;
            }
        }

    public:
        Usage()
        { static const bool dummy = (init_once(), true); (void) dummy; }
    };

    namespace detail {
        const Usage usage;
    }  // namespace detail

}}  // namespace cppx::utf8_iostreams

问题中的两个示例程序只需通过包含上述标题来代替<iostream>或与var my_very_large_array = [...]; my_very_large_array.forEach(function() { ... }) //or _.each(my_very_large_array, function() { ... }) 一起使用。除此之外,它可以在一个单独的翻译单元中(隐式转换缺陷修复除外,如果需要,必须以某种方式包含它的标题)。或者例如作为强制包含在构建命令中。