在C ++中对字符串进行标记时计算标记?

时间:2014-05-27 20:07:44

标签: c++ token tokenize

Java有一种简单的方法来计算您标记的标记:

import java.util.*;

public class Program
{
 public static void main(String[] args)
 {
    String str =
        "This is/some text/that I am/parsing/using StringTokenizer/.";

    StringTokenizer strTok =
        new StringTokenizer(str, "/", false);

    System.out.println("Count...");
    System.out.println(strTok.countTokens());
}
}



Output:Count...6

在C ++中有什么简单的方法吗?

2 个答案:

答案 0 :(得分:3)

您可以将std::istringstream类与函数std::getline一起使用。例如

#include <iostream>
#include <sstream>
#include <string>

int main()
{
    char s[] = "This is/some text/that I am/parsing/using StringTokenizer/.";

    std::istringstream is( s );

    size_t count = 0;

    std::string line;

    while ( std::getline( is, line, '/' ) ) ++count;

    std::cout << "There are " << count << " tokens" << std::endl;
}

输出

There are 6 tokens

或者

#include <iostream>
#include <sstream>
#include <string>
#include <vector>

int main()
{
    char s[] = "This is/some text/that I am/parsing/using StringTokenizer/.";

    std::istringstream is( s );

    std::vector<std::string> v;
    std::string line;

    while ( std::getline( is, line, '/' ) ) v.push_back( line );

    std::cout << "There are " << v.size() << " tokens" << std::endl;
}

再次构建您可以使用的向量中的字符串,例如以下代码

#include <iostream>
#include <sstream>
#include <string>
#include <vector>

int main()
{
    char s[] = "This is/some text/that I am/parsing/using StringTokenizer/.";

    std::istringstream is( s );

    std::vector<std::string> v;
    std::string line;

    while ( std::getline( is, line, '/' ) ) v.push_back( line );

    std::cout << "There are " << v.size() << " tokens" << std::endl;

    std::string s1;

    bool first = true;
    for ( const std::string &t : v )
    {
        if ( first ) first = false;
        else s1 += '/';

        s1 += t;
    }

    std::cout << s1 << std::endl;
}

或者您可以使用标头std::replace中声明的标准算法<algorithm>将原始字符串中的一个分隔符替换为另一个分隔符。

如果您的编译器不支持基于for循环的范围,那么您可以改为编写

    for ( std::vector<std::string>::size_type i = 0; i < v.size(); i++ )
    {
        if ( i != 0 ) s1 += '/';

        s1 += v[i];
    }

答案 1 :(得分:1)

你可以试试这个:

std::vector<std::string> v(std::istream_iterator<std::string>(std::cin), {});

std::cout << "Count..." << v.size() << "\n";

这当然会在空格处进行标记,而不是在任意分隔符处进行标记。要拆分仲裁分隔符,我们需要std::getline,但现在我们没有轻松istream_iterator。谢天谢地,这是一个solved problem。所以我们写道:

#include <iostream>
#include <iterator>
#include <string>
#include <vector>

namespace detail 
{
    template <char Sep = '\n'>
    class Line : public std::string 
    { 
        friend std::istream & operator>>(std::istream & is, Line & line)
        {   
            return std::getline(is, line, Sep);
        }
    };
}

int main()
{
    std::vector<std::string> v(std::istream_iterator<detail::Line<'/'>>(std::cin), {});
    std::cout << "Count..." << v.size() << "\n";
    for (auto const & s : v) std::cout << s << "\n";
}

如果要标记现有字符串而不是标准输入,请使用字符串流,即将std::cin替换为iss,我们在其中:

#include <sstream>

std::istringstream iss(my_input_string);