选择性迭代器

时间:2010-06-15 15:42:30

标签: c++ algorithm stl iterator find

仅供参考:没有提升,是的,有这个,我希望重新发明轮子;)

C ++中是否存在某种形式的选择性迭代器(可能)?我想要的是分离这样的字符串:

some:word{or other

到这样的形式:

some : word { or other

我可以用两个循环和find_first_of(“:”)和(“{”)来做到这一点,但这对我来说似乎(非常)效率低下。我想也许有一种方法可以创建/定义/编写一个迭代器,它将使用for_each迭代所有这些值。我担心这会让我为std :: string编写一个完整的自定义方式 - 太复杂的迭代器类。

所以我想也许会这样做:

std::vector<size_t> list;
size_t index = mystring.find(":");
while( index != std::string::npos )
{
    list.push_back(index);
    index = mystring.find(":", list.back());
}
std::for_each(list.begin(), list.end(), addSpaces(mystring));

这对我来说看起来很混乱,而且我很确定这样做会有更优雅的方式。但我想不出来。任何人都有一个好主意?感谢

PS:我没有测试发布的代码,只是快速写下我会尝试的内容

更新:在考虑了所有答案之后,我想出了这个,它符合我的喜好:)。这确实假设最后一个字符是换行符,否则结尾{}:将无法处理。

void tokenize( string &line )
{
    char oneBack = ' ';
    char twoBack = ' ';
    char current = ' ';
    size_t length = line.size();

    for( size_t index = 0; index<length; ++index )
    {
        twoBack = oneBack;
        oneBack = current;
        current = line.at( index );
        if( isSpecial(oneBack) )
        {
            if( !isspace(twoBack) ) // insert before
            {
                line.insert(index-1, " ");
                ++index;
                ++length;
            }
            if( !isspace(current) ) // insert after
            {
                line.insert(index, " ");
                ++index;
                ++length;
            }
        }
    }

欢迎评论一如既往:)

5 个答案:

答案 0 :(得分:4)

使用std :: istream_iterator相对容易。

您需要做的是定义自己的类(比如Term)。然后使用运算符&gt;&gt;定义如何从流中读取单个“单词”(术语)。

我不知道您对单词的确切定义,因此我使用以下定义:

  • 任何连续的字母数字字符序列都是一个术语
  • 任何一个非字母数字的非空格字符都是单词。

试试这个:

#include <string>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>

class Term
{
    public:

        // This cast operator is not required but makes it easy to use
        // a Term anywhere that a string can normally be used.
        operator std::string const&() const {return value;}

    private:
        // A term is just a string
        // And we friend the operator >> to make sure we can read it.
        friend std::istream& operator>>(std::istream& inStr,Term& dst);
        std::string     value;
};

现在我们要做的就是定义一个运算符&gt;&gt;根据规则读取一个单词:

// This function could be a lot neater using some boost regular expressions.
// I just do it manually to show it can be done without boost (as requested)
std::istream& operator>>(std::istream& inStr,Term& dst)
{
   // Note the >> operator drops all proceeding white space.
   // So we get the first non white space
   char first;
   inStr >> first;

   // If the stream is in any bad state the stop processing.
   if (inStr)
   {
       if(std::isalnum(first))
       {
           // Alpha Numeric so read a sequence of characters
           dst.value = first;

           // This is ugly. And needs re-factoring.
           while((first = insStr.get(), inStr) && std::isalnum(first))
           {
               dst.value += first;
           }

           // Take into account the special case of EOF.
           // And bad stream states.
           if (!inStr)
           {
               if (!inStr.eof())
               {
                   // The last letter read was not EOF and and not part of the word
                   // So put it back for use by the next call to read from the stream.
                   inStr.putback(first);
               }
               // We know that we have a word so clear any errors to make sure it
               // is used. Let the next attempt to read a word (term) fail at the outer if.
               inStr.clear();
           }
       }
       else
       {
           // It was not alpha numeric so it is a one character word.
           dst.value   = first;
       }
  }
  return inStr;
}

现在我们可以通过使用istream_iterator

在标准算法中使用它
int main()
{
    std::string         data    = "some:word{or other";
    std::stringstream   dataStream(data);


    std::copy(  // Read the stream one Term at a time.
                std::istream_iterator<Term>(dataStream),
                std::istream_iterator<Term>(),

                // Note the ostream_iterator is using a std::string
                // This works because a Term can be converted into a string.
                std::ostream_iterator<std::string>(std::cout, "\n")
             );

}

输出:

> ./a.exe
some
:
word
{
or
other

答案 1 :(得分:1)

std::string const str = "some:word{or other";

std::string result;
result.reserve(str.size());
for (std::string::const_iterator it = str.begin(), end = str.end();
     it != end; ++it)
{
  if (isalnum(*it))
  {
    result.push_back(*it);
  }
  else
  {
    result.push_back(' '); result.push_back(*it); result.push_back(' ');
  }
}

插入版本以加速

std::string str = "some:word{or other";

for (std::string::iterator it = str.begin(), end = str.end(); it != end; ++it)
{
  if (!isalnum(*it))
  {
    it = str.insert(it, ' ') + 2;
    it = str.insert(it, ' ');
    end = str.end();
  }
}

请注意,在迭代器传递之前插入std::string::insert并将迭代器返回到新插入的字符。分配很重要,因为缓冲区可能已在另一个内存位置重新分配(迭代器因插入而无效)。另请注意,整个循环不能保留end,每次插入时都需要重新计算。

答案 2 :(得分:0)

如下:

std::string::const_iterator it, end = mystring.end();
for(it = mystring.begin(); it != end; ++it) {
  if ( !isalnum( *it ))
    list.push_back(it);
}

这样,你只需要遍历字符串一次,而ctype.h中的isalnum似乎可以做你想要的。当然,上面的代码非常简单和不完整,只能提出解决方案。

答案 3 :(得分:0)

  

存在一种更优雅的方式。

我不知道BOOST如何实现这一点,但传统的方法是将字符输入字符串逐字符入FSM,以检测标记(单词,符号)的开始和结束位置。

  

我可以用两个循环和find_first_of(“:”)和(“{”)

来做到这一点

使用std :: find_first_of()的一个循环就足够了。

虽然我仍然是FSM用于此类解析任务的忠实粉丝。

P.S。 Similar question

答案 4 :(得分:0)

您是否希望将输入字符串标记为ala strtok

如果是这样,这里有一个可以使用的标记化功能。它需要一个输入string和一串分隔符(每个字符串都是一个可能的分隔符),它返回一个token s的向量。每个token都是带有分隔字符串的元组,在这种情况下使用分隔符:

#include <cstdlib>
#include <vector>
#include <string>
#include <functional>
#include <iostream>
#include <algorithm>
using namespace std;

//  FUNCTION :      stringtok(char const* Raw, string sToks)
//  PARAMATERS :    Raw     Pointer to NULL-Terminated string containing a string to be tokenized.
//                  sToks   string of individual token characters -- each character in the string is a token
//  DESCRIPTION :   Tokenizes a string, much in the same was as strtok does.  The input string is not modified.  The
//                  function is called once to tokenize a string, and all the tokens are retuned at once.
//  RETURNS :       Returns a vector of strings.  Each element in the vector is one token.  The token character is
//                  not included in the string.  The number of elements in the vector is N+1, where N is the number
//                  of times the Token character is found in the string.  If one token is an empty string (as with the
//                  string "string1##string3", where the token character is '#'), then that element in the vector
//                  is an empty string.
//  NOTES :         
//
typedef pair<char,string> token;    // first = delimiter, second = data
inline vector<token> tokenize(const string& str, const string& delims, bool bCaseSensitive=false)   // tokenizes a string, returns a vector of tokens
{
    bCaseSensitive;

    // prologue
    vector<token> vRet;
    // tokenize input string
    for( string::const_iterator itA = str.begin(), it=itA; it != str.end(); it = find_first_of(++it,str.end(),delims.begin(),delims.end()) )
    {
        // prologue
        // find end of token
        string::const_iterator itEnd = find_first_of(it+1,str.end(),delims.begin(),delims.end());
        // add string to output
        if( it == itA ) vRet.push_back(make_pair(0,string(it,itEnd)));
        else            vRet.push_back(make_pair(*it,string(it+1,itEnd)));
        // epilogue
    }
    // epilogue
    return vRet;
}

using namespace std;

int main()
{
    string input = "some:word{or other";
    typedef vector<token> tokens;
    tokens toks = tokenize(input.c_str(), " :{");
    cout << "Input: '" << input << " # Tokens: " << toks.size() << "'\n";
    for( tokens::iterator it = toks.begin(); it != toks.end(); ++it )
    {
        cout << "  Token : '" << it->second << "', Delimiter: '" << it->first << "'\n";
    }
    return 0;

}