关于SO的第一个问题! :d
我得到一个包含UTF-16编码字符串的std::istream
。想象一下这样打开的UTF-16编码文本文件:
std::ifstream file( "mytext_utf16.txt", std::ios::binary );
我想将此流传递给一个带有std::wistream&
参数的函数。我无法将文件流类型更改为std :: wifstream。
问题:标准或升级库中是否有任何设施可以让我将istream“重新解释”为wistream?
我正在想象一个类似于std::wbuffer_convert的适配器类,除了它不应该进行任何编码转换。基本上对于从适配器类读取的每个wchar_t,它应该从关联的istream中读取两个字节,并将它们reinterpret_cast
读取到wchar_t。
我使用boost::iostreams创建了一个可以像这样使用的实现,并且像魅力一样:
std::ifstream file( "mytext_utf16.txt", std::ios::binary );
// Create an instance of my adapter class.
reinterpret_as_wide_stream< std::ifstream > wfile( &file );
// Read a wstring from file, using the adapter.
std::wstring str;
std::get_line( wfile, str );
我为什么要问呢?因为我喜欢重用现有代码而不是重新发明轮子。
答案 0 :(得分:3)
由于还没有其他答案,我发布了使用 Boost.Iostreams 库的解决方案。虽然这很简单,但我仍然认为应该有一个更简单的解决方案。
首先,我们创建一个模板类,它模拟Boost.Iostreams device概念,并作为相关窄设备的适配器。它将读取,写入和搜索操作转发到关联设备,但调整流位置和大小值以适应不同大小之间的差异。狭窄和广泛的人物类型。
<强> “basic_reinterpret_device.h”强>
#pragma once
#include <boost/iostreams/traits.hpp>
#include <boost/iostreams/read.hpp>
#include <boost/iostreams/write.hpp>
#include <boost/iostreams/seek.hpp>
// CategoryT: boost.iostreams device category tag
// DeviceT : type of associated narrow device
// CharT : (wide) character type of this device adapter
template< typename CategoryT, typename DeviceT, typename CharT >
class basic_reinterpret_device
{
public:
using category = CategoryT; // required by boost::iostreams device concept
using char_type = CharT; // required by boost::iostreams device concept
using associated_device = DeviceT;
using associated_char_type = typename boost::iostreams::char_type_of< DeviceT >::type;
static_assert( sizeof( associated_char_type ) == 1, "Associated device must have a byte-sized char_type" );
// Default constructor.
basic_reinterpret_device() = default;
// Construct from a narrow device
explicit basic_reinterpret_device( DeviceT* pDevice ) :
m_pDevice( pDevice ) {}
// Get the asociated device.
DeviceT* get_device() const { return m_pDevice; }
// Read up to n characters from the underlying data source into the buffer s,
// returning the number of characters read; return -1 to indicate EOF
std::streamsize read( char_type* s, std::streamsize n )
{
ThrowIfDeviceNull();
std::streamsize bytesRead = boost::iostreams::read(
*m_pDevice,
reinterpret_cast<associated_char_type*>( s ),
n * sizeof( char_type ) );
if( bytesRead == static_cast<std::streamsize>( -1 ) ) // EOF
return bytesRead;
return bytesRead / sizeof( char_type );
}
// Write up to n characters from the buffer s to the output sequence, returning the
// number of characters written.
std::streamsize write( const char_type* s, std::streamsize n )
{
ThrowIfDeviceNull();
std::streamsize bytesWritten = boost::iostreams::write(
*m_pDevice,
reinterpret_cast<const associated_char_type*>( s ),
n * sizeof( char_type ) );
return bytesWritten / sizeof( char_type );
}
// Advances the read/write head by off characters, returning the new position,
// where the offset is calculated from:
// - the start of the sequence if way == ios_base::beg
// - the current position if way == ios_base::cur
// - the end of the sequence if way == ios_base::end
std::streampos seek( std::streamoff off, std::ios_base::seekdir way )
{
ThrowIfDeviceNull();
std::streampos newPos = boost::iostreams::seek( *m_pDevice, off * sizeof( char_type ), way );
return newPos / sizeof( char_type );
}
protected:
void ThrowIfDeviceNull()
{
if( ! m_pDevice )
throw std::runtime_error( "basic_reinterpret_device - no associated device" );
}
private:
DeviceT* m_pDevice = nullptr;
};
为了简化此模板的使用,我们为最常见的Boost.Iostreams设备标记创建了一些别名模板。基于这些,我们创建了别名模板,以构建标准兼容的流缓冲区和流。
<强> “reinterpret_stream.h”强>
#pragma once
#include "basic_reinterpret_device.h"
#include <boost/iostreams/categories.hpp>
#include <boost/iostreams/traits.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/stream_buffer.hpp>
struct reinterpret_device_tag : virtual boost::iostreams::source_tag, virtual boost::iostreams::sink_tag {};
struct reinterpret_source_seekable_tag : boost::iostreams::device_tag, boost::iostreams::input_seekable {};
struct reinterpret_sink_seekable_tag : boost::iostreams::device_tag, boost::iostreams::output_seekable {};
template< typename DeviceT, typename CharT >
using reinterpret_source = basic_reinterpret_device< boost::iostreams::source_tag, DeviceT, CharT >;
template< typename DeviceT, typename CharT >
using reinterpret_sink = basic_reinterpret_device< boost::iostreams::sink_tag, DeviceT, CharT >;
template< typename DeviceT, typename CharT >
using reinterpret_device = basic_reinterpret_device< reinterpret_device_tag, DeviceT, CharT >;
template< typename DeviceT, typename CharT >
using reinterpret_device_seekable = basic_reinterpret_device< boost::iostreams::seekable_device_tag, DeviceT, CharT >;
template< typename DeviceT, typename CharT >
using reinterpret_source_seekable =
basic_reinterpret_device< reinterpret_source_seekable_tag, DeviceT, CharT >;
template< typename DeviceT, typename CharT >
using reinterpret_sink_seekable =
basic_reinterpret_device< reinterpret_sink_seekable_tag, DeviceT, CharT >;
template< typename DeviceT >
using reinterpret_as_wistreambuf = boost::iostreams::stream_buffer< reinterpret_source_seekable< DeviceT, wchar_t > >;
template< typename DeviceT >
using reinterpret_as_wostreambuf = boost::iostreams::stream_buffer< reinterpret_sink_seekable< DeviceT, wchar_t > >;
template< typename DeviceT >
using reinterpret_as_wstreambuf = boost::iostreams::stream_buffer< reinterpret_device_seekable< DeviceT, wchar_t > >;
template< typename DeviceT >
using reinterpret_as_wistream = boost::iostreams::stream< reinterpret_source_seekable< DeviceT, wchar_t > >;
template< typename DeviceT >
using reinterpret_as_wostream = boost::iostreams::stream< reinterpret_sink_seekable< DeviceT, wchar_t > >;
template< typename DeviceT >
using reinterpret_as_wstream = boost::iostreams::stream< reinterpret_device_seekable< DeviceT, wchar_t > >;
用法示例:
#include "reinterpret_stream.h"
void read_something_as_utf16( std::istream& input )
{
reinterpret_as_wistream< std::istream > winput( &input );
std::wstring wstr;
std::getline( winput, wstr );
}
void write_something_as_utf16( std::ostream& output )
{
reinterpret_as_wostream< std::ostream > woutput( &output );
woutput << L"сайт вопросов и ответов для программистов";
}
答案 1 :(得分:2)
这是正在进行的工作
这不是你应该使用的,但可能暗示你可以开始的,如果你还没想过做这样的事情。如果这没有用,或者你可以找到更好的解决方案,我很高兴删除或扩展这个答案。
据我所知,你想读取一个UTF-8文件,只需将每个字符转换成wchar_t。
如果标准设施的功能太多,你就不能写自己的方面。
#include <codecvt>
#include <locale>
#include <fstream>
#include <cwchar>
#include <iostream>
#include <fstream>
class MyConvert
{
public:
using state_type = std::mbstate_t;
using result = std::codecvt_base::result;
using From = char;
using To = wchar_t;
bool always_noconv() const throw() {
return false;
}
result in(state_type& __state, const From* __from,
const From* __from_end, const From*& __from_next,
To* __to, To* __to_end, To*& __to_next) const
{
while (__from_next != __from_end) {
*__to_next = static_cast<To>(*__from_next);
++__to_next;
++__from_next;
}
return result::ok;
}
result out(state_type& __state, const To* __from,
const To* __from_end, const To*& __from_next,
From* __to, From* __to_end, From*& __to_next) const
{
while (__from_next < __from_end) {
std::cout << __from << " " << __from_next << " " << __from_end << " " << (void*)__to <<
" " << (void*)__to_next << " " << (void*)__to_end << std::endl;
if (__to_next >= __to_end) {
std::cout << "partial" << std::endl;
std::cout << "__from_next = " << __from_next << " to_next = " <<(void*) __to_next << std::endl;
return result::partial;
}
To* tmp = reinterpret_cast<To*>(__to_next);
*tmp = *__from_next;
++tmp;
++__from_next;
__to_next = reinterpret_cast<From*>(tmp);
}
return result::ok;
}
};
int main() {
std::ofstream of2("test2.out");
std::wbuffer_convert<MyConvert, wchar_t> conv(of2.rdbuf());
std::wostream wof2(&conv);
wof2 << L"сайт вопросов и ответов для программистов";
wof2.flush();
wof2.flush();
}
这不是你应该在你的代码中使用的。如果这是正确的方向,你需要来阅读文档,包括这个方面需要什么,所有这些指针的含义,以及你需要如何写入它们。
如果你想使用这样的东西,你需要考虑你应该用于facet的模板参数(如果有的话)。
更新我现在更新了我的代码。现在,功能更接近我们想要的。它不漂亮,只是一个测试代码,我仍然不确定为什么__from_next
没有更新(或保留)。
目前问题是我们无法写入流。使用gcc,我们只是失去了wbuffer_convert的同步,对于clang,我们得到一个SIGILL。