Question

我有一个包含文本的std :: wstring变量，我需要通过分隔符将其拆分。我怎么能这样做？我不会使用boost来产生一些警告。谢谢

编辑1 这是一个示例文本：

你好吗？

这是代码：

typedef boost::tokenizer<boost::char_separator<wchar_t>, std::wstring::const_iterator, std::wstring> Tok;

boost::char_separator<wchar_t> sep;

Tok tok(this->m_inputText, sep);

for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
{
    cout << *tok_iter;
}

结果是：

喜
如何
是
你
？

我不明白为什么最后一个角色总是被分成另一个角色...

Answer 1

在你的代码中，问号出现在一个单独的行上，因为这是boost :: tokenizer默认工作的方式。

如果您想要的输出是四个标记（“hi”，“how”，“are”和“you？”），您可以

a）更改您正在使用的char_separator

boost::char_separator<wchar_t> sep(L" ", L"");

b）使用boost::split，我认为这是“按指定字符拆分wstring”的最直接答案

#include <string>
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>

int main()
{

        std::wstring m_inputText = L"hi how are you?";

        std::vector<std::wstring> tok;
        split(tok, m_inputText, boost::is_any_of(L" "));

        for(std::vector<std::wstring>::iterator tok_iter = tok.begin();
                        tok_iter != tok.end(); ++tok_iter)
        {
                std::wcout << *tok_iter << '\n';
        }

}

测试运行：https://ideone.com/jOeH9

Answer 2

您默认构建boost::char_separator。 The documentation说：

函数std :: isspace（）用于标识删除的分隔符，std :: ispunct（）用于标识保留的分隔符。此外，还会删除空令牌。

由于std::ispunct(L'?')为真，因此将其视为“保留”分隔符，并作为单独的令牌报告。

Answer 3

您好，您可以使用wcstok功能

Answer 4

你说过你不想要提升......

这可能是在C ++中使用的一种奇怪的方法，但我在MUD中使用它，我需要在C中进行大量的标记化。

将此内存块分配给char * chars：

char chars [] =＆＃34;我喜欢摆弄记忆＆＃34 ;;

如果您需要对空格字符进行标记：

create array of char* called splitvalues big enough to store all tokens
while not increment pointer chars and compare value to '\0'
  if not already set set address of splitvalues[counter] to current memory address - 1
     if value is ' ' write 0 there
       increment counter

当你完成时你已经销毁了原始字符串，所以不要使用它，而是让你的字符串数组指向标记。令牌的数量是计数器变量（数组的上限）。

方法是：

迭代字符串并在第一次出现时更新令牌开始指针
将需要拆分的字符转换为在C
计算你做了多少次

PS。不确定你是否可以在unicode环境中使用类似的方法。

按指定的分隔符拆分wstring

4 个答案: