Question

我使用boost :: split来解析数据文件。数据文件包含以下行。

data.txt中

1:1~15  ASTKGPSVFPLAPSS SVFPLAPSS   -12.6   98.3

项目之间的空白区域是标签。我必须拆分以上代码的代码如下。

std::string buf;
/*Assign the line from the file to buf*/
std::vector<std::string> dataLine;
boost::split( dataLine, buf , boost::is_any_of("\t "), boost::token_compress_on);       //Split data line
cout << dataLine.size() << endl;

对于上面的代码行，我应该得到5的打印，但是我得到6.我试图阅读文档，这个解决方案似乎应该做我想要的，显然我错过了一些东西。谢谢！

编辑：在dataLine上运行forloop，如下所示：

cout << "****" << endl;
for(int i = 0 ; i < dataLine.size() ; i ++) cout << dataLine[i] << endl;
cout << "****" << endl;


****
1:1~15
ASTKGPSVFPLAPSS
SVFPLAPSS
-12.6
98.3

****

Answer 1

即使“相邻的分隔符合并在一起”，似乎尾随的分隔符也会产生问题，因为即使它们被视为一个，它仍然一个分隔符。

所以单靠split()无法解决您的问题。但幸运的是，Boost String Algo有trim() and trim_if()，它从字符串的开头和结尾去掉空格或分隔符。所以只需在buf上调用trim()，就像这样：

std::string buf = "1:1~15  ASTKGPSVFPLAPSS SVFPLAPSS   -12.6   98.3    ";
std::vector<std::string> dataLine;
boost::trim_if(buf, boost::is_any_of("\t ")); // could also use plain boost::trim
boost::split(dataLine, buf, boost::is_any_of("\t "), boost::token_compress_on);
std::cout << out.size() << std::endl;

此问题已被提出：boost::split leaves empty tokens at the beginning and end of string - is this desired behaviour?

Answer 2

我建议使用C++ String Toolkit Library。在我看来，这个库比Boost快得多。我曾经使用Boost来分割（aka tokenize）一行文本，但发现这个库更符合我的想法。

strtk::parse的一大优点是将令牌转换为最终值并检查元素数量。

你可以这样使用它：

std::vector<std::string> tokens;

// multiple delimiters should be treated as one
if( !strtk::parse( dataLine, "\t", tokens ) )
{
    std::cout << "failed" << std::endl;
}

---另一个版本

std::string token1;
std::string token2;
std::string token3:
float value1;
float value2;

if( !strtk::parse( dataLine, "\t", token1, token2, token3, value1, value2) )
{
     std::cout << "failed" << std::endl;
     // fails if the number of elements is not what you want
}

图书馆的在线文档：String Tokenizer Documentation 链接到源代码：C++ String Toolkit Library

Answer 3

boost::split故意留下前导空格和尾随空格，因为它不知道它是否重要。解决方案是在调用boost::trim之前使用boost::split。

#include <boost/algorithm/string/trim.hpp>

....

boost::trim(buf);

如何使用boost split来拆分字符串并忽略空值？

3 个答案: