Question

我遇到了一个问题，无法在互联网上找到答案。虽然我发现了许多类似的问题，但没有一个答案适合我。所以我的部分代码是：

ImageView

我需要的是用C ++读取用UTF-8编码的文本文件。但是，当我读取文件时，那些宽字符串会被更改，当我打印它们时，输出文本就完全不同了。输入： ACE 输出：ÄÄÄ 如何避免它并正确阅读文本？我正在开发Visual Studio 2015，C ++，Windows 10。

Answer 1

UTF-8 is（可能）不在广泛的字符串中。阅读UTF-8 everywhere。 UTF-8使用 8位字节（有时几个）来编码Unicode字符。所以在C ++中，unicode字符是从1到6个字节的序列中解析出来的（即char - s）。

您需要一些UTF-8解析器，而C11或C ++ 11标准不提供任何解析器。所以你需要一些外部库。查看libunistring（这是一个简单的C UTF-8解析库）或其他内容（Qt，POCO，Glib，ICU，... ）。您可以决定将UTF-8解析并转换为宽UTF-32（使用u32string - s和char32_t）并向后转换，或者您最好决定在内部以UTF格式工作8（使用std::string和char）

因此，您将解析并打印char - s的序列（使用UTF-8编码），您的程序将使用普通std::string - s和普通char - s （不是std::wstring或wchar_t），但处理 UTF-8序列 ...

Answer 2

Boost.Spirit：

很容易

#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>

using namespace boost::spirit;

int main()
{
    std::string in("ąčę");
    std::string out;
    qi::parse(in.begin(), in.end(), +unicode::char_, out);
    std::cout << out << std::endl;
}

以下示例读取一系列元组（book，authors，takenBy）：

#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_tuple.hpp>
#include <iostream>
#include <string>
#include <tuple>
#include <vector>

using namespace boost::spirit;

int main()
{
    std::string in("Book_1\nAuthors_1\nTakenBy_1\n"\
                   "Book ąčę\nAuthors_2\nTakenBy_2\n");
    std::vector<
        std::tuple<
            std::string, /* book */
            std::string, /* authors */
            std::string  /* takenBy */
        > 
    > out;
    auto ok = qi::parse(in.begin(), in.end(),
                        *(
                               +(unicode::char_ - qi::eol) >> qi::eol /* book */
                            >> +(unicode::char_ - qi::eol) >> qi::eol /* authors */
                            >> +(unicode::char_ - qi::eol) >> qi::eol /* takenBy */
                        ),
                        out);
    if(ok)
    {
        for(auto& entry : out)
        {
            std::string book, authors, takenBy;
            std::tie(book, authors, takenBy) = entry;
            std::cout << "book: "    << book    << std::endl
                      << "authors: " << authors << std::endl
                      << "takenBy: " << takenBy << std::endl;
        }
    }
}

它只是一个使用std::tuple的演示和一个未命名的解析器，它是qi::parse的第三个参数。您可以使用struct代替元组来表示书籍，作者，流派等。未命名的解析器可以替换为grammar，您可以阅读将该文件转换为要传递给qi::parse的字符串。

从c ++文件中读取和打印UTF-8符号

2 个答案: