Question

我正在尝试解析可能包含中文字符的std::string。例如，对于包含

的字符串

哈囉hi你好hello

我想将它们分成6个字符串：哈，囉，hi，你，好，hello。现在，通过使用文本文件中的getline()获取字符串。引用此帖子How to use boost::spirit to parse UTF-8?，这是我当前的代码：

#include <boost/regex/pending/unicode_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/range.hpp>
#include <iterator>
#include <iostream>
#include <ostream>
#include <cstdint>
#include <string>

using namespace boost;
using namespace std;
using namespace std::string_literals; 

int main()
{
    string str = u8"哈囉hi你好hello"; //actually got from getline()
    auto &&utf8_text = str;

    u8_to_u32_iterator<const char*>
        tbegin(begin(utf8_text)), tend(end(utf8_text));

    vector<uint32_t> result;
    spirit::qi::parse(tbegin, tend, *spirit::standard_wide::char_, result);
    for(auto &&code_point : result) {
        cout << code_point << ";";
    }
}

但得到了错误：打电话给＆＃39;开始＆＃39;并且＆＃39;结束＆＃39;很暧昧。它直接声明auto &&utf8_text = u8"哈囉hi你好hello"时有效，但我不能用这种方式写，因为字符串的内容由getline()确定。

我也试过这个：

auto str = u8"你好，世界！";
auto &&utf8_text = str;

但仍然出现错误：没有匹配功能可以拨打＆＃39;开始＆＃39;并且＆＃39;结束＆＃39;。

Answer 1

带有字符串文字的

auto会产生一个字符指针。如果你想要std::string，你必须把它写出来。

如何解析UTF-8中文字符串

1 个答案: