仅供参考:没有提升,是的,有这个,我希望重新发明轮子;)
C ++中是否存在某种形式的选择性迭代器(可能)?我想要的是分离这样的字符串:
some:word{or other
到这样的形式:
some : word { or other
我可以用两个循环和find_first_of(“:”)和(“{”)来做到这一点,但这对我来说似乎(非常)效率低下。我想也许有一种方法可以创建/定义/编写一个迭代器,它将使用for_each迭代所有这些值。我担心这会让我为std :: string编写一个完整的自定义方式 - 太复杂的迭代器类。
所以我想也许会这样做:
std::vector<size_t> list;
size_t index = mystring.find(":");
while( index != std::string::npos )
{
list.push_back(index);
index = mystring.find(":", list.back());
}
std::for_each(list.begin(), list.end(), addSpaces(mystring));
这对我来说看起来很混乱,而且我很确定这样做会有更优雅的方式。但我想不出来。任何人都有一个好主意?感谢
PS:我没有测试发布的代码,只是快速写下我会尝试的内容
更新:在考虑了所有答案之后,我想出了这个,它符合我的喜好:)。这确实假设最后一个字符是换行符,否则结尾{
,}
或:
将无法处理。
void tokenize( string &line )
{
char oneBack = ' ';
char twoBack = ' ';
char current = ' ';
size_t length = line.size();
for( size_t index = 0; index<length; ++index )
{
twoBack = oneBack;
oneBack = current;
current = line.at( index );
if( isSpecial(oneBack) )
{
if( !isspace(twoBack) ) // insert before
{
line.insert(index-1, " ");
++index;
++length;
}
if( !isspace(current) ) // insert after
{
line.insert(index, " ");
++index;
++length;
}
}
}
欢迎评论一如既往:)
答案 0 :(得分:4)
使用std :: istream_iterator相对容易。
您需要做的是定义自己的类(比如Term)。然后使用运算符&gt;&gt;定义如何从流中读取单个“单词”(术语)。
我不知道您对单词的确切定义,因此我使用以下定义:
试试这个:
#include <string>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>
class Term
{
public:
// This cast operator is not required but makes it easy to use
// a Term anywhere that a string can normally be used.
operator std::string const&() const {return value;}
private:
// A term is just a string
// And we friend the operator >> to make sure we can read it.
friend std::istream& operator>>(std::istream& inStr,Term& dst);
std::string value;
};
现在我们要做的就是定义一个运算符&gt;&gt;根据规则读取一个单词:
// This function could be a lot neater using some boost regular expressions.
// I just do it manually to show it can be done without boost (as requested)
std::istream& operator>>(std::istream& inStr,Term& dst)
{
// Note the >> operator drops all proceeding white space.
// So we get the first non white space
char first;
inStr >> first;
// If the stream is in any bad state the stop processing.
if (inStr)
{
if(std::isalnum(first))
{
// Alpha Numeric so read a sequence of characters
dst.value = first;
// This is ugly. And needs re-factoring.
while((first = insStr.get(), inStr) && std::isalnum(first))
{
dst.value += first;
}
// Take into account the special case of EOF.
// And bad stream states.
if (!inStr)
{
if (!inStr.eof())
{
// The last letter read was not EOF and and not part of the word
// So put it back for use by the next call to read from the stream.
inStr.putback(first);
}
// We know that we have a word so clear any errors to make sure it
// is used. Let the next attempt to read a word (term) fail at the outer if.
inStr.clear();
}
}
else
{
// It was not alpha numeric so it is a one character word.
dst.value = first;
}
}
return inStr;
}
现在我们可以通过使用istream_iterator
在标准算法中使用它int main()
{
std::string data = "some:word{or other";
std::stringstream dataStream(data);
std::copy( // Read the stream one Term at a time.
std::istream_iterator<Term>(dataStream),
std::istream_iterator<Term>(),
// Note the ostream_iterator is using a std::string
// This works because a Term can be converted into a string.
std::ostream_iterator<std::string>(std::cout, "\n")
);
}
输出:
> ./a.exe
some
:
word
{
or
other
答案 1 :(得分:1)
std::string const str = "some:word{or other";
std::string result;
result.reserve(str.size());
for (std::string::const_iterator it = str.begin(), end = str.end();
it != end; ++it)
{
if (isalnum(*it))
{
result.push_back(*it);
}
else
{
result.push_back(' '); result.push_back(*it); result.push_back(' ');
}
}
插入版本以加速
std::string str = "some:word{or other";
for (std::string::iterator it = str.begin(), end = str.end(); it != end; ++it)
{
if (!isalnum(*it))
{
it = str.insert(it, ' ') + 2;
it = str.insert(it, ' ');
end = str.end();
}
}
请注意,在迭代器传递之前插入std::string::insert
并将迭代器返回到新插入的字符。分配很重要,因为缓冲区可能已在另一个内存位置重新分配(迭代器因插入而无效)。另请注意,整个循环不能保留end
,每次插入时都需要重新计算。
答案 2 :(得分:0)
如下:
std::string::const_iterator it, end = mystring.end();
for(it = mystring.begin(); it != end; ++it) {
if ( !isalnum( *it ))
list.push_back(it);
}
这样,你只需要遍历字符串一次,而ctype.h中的isalnum似乎可以做你想要的。当然,上面的代码非常简单和不完整,只能提出解决方案。
答案 3 :(得分:0)
存在一种更优雅的方式。
我不知道BOOST如何实现这一点,但传统的方法是将字符输入字符串逐字符入FSM,以检测标记(单词,符号)的开始和结束位置。
我可以用两个循环和find_first_of(“:”)和(“{”)
来做到这一点
使用std :: find_first_of()的一个循环就足够了。
虽然我仍然是FSM用于此类解析任务的忠实粉丝。
P.S。 Similar question
答案 4 :(得分:0)
您是否希望将输入字符串标记为ala strtok
?
如果是这样,这里有一个可以使用的标记化功能。它需要一个输入string
和一串分隔符(每个字符串都是一个可能的分隔符),它返回一个token
s的向量。每个token
都是带有分隔字符串的元组,在这种情况下使用分隔符:
#include <cstdlib>
#include <vector>
#include <string>
#include <functional>
#include <iostream>
#include <algorithm>
using namespace std;
// FUNCTION : stringtok(char const* Raw, string sToks)
// PARAMATERS : Raw Pointer to NULL-Terminated string containing a string to be tokenized.
// sToks string of individual token characters -- each character in the string is a token
// DESCRIPTION : Tokenizes a string, much in the same was as strtok does. The input string is not modified. The
// function is called once to tokenize a string, and all the tokens are retuned at once.
// RETURNS : Returns a vector of strings. Each element in the vector is one token. The token character is
// not included in the string. The number of elements in the vector is N+1, where N is the number
// of times the Token character is found in the string. If one token is an empty string (as with the
// string "string1##string3", where the token character is '#'), then that element in the vector
// is an empty string.
// NOTES :
//
typedef pair<char,string> token; // first = delimiter, second = data
inline vector<token> tokenize(const string& str, const string& delims, bool bCaseSensitive=false) // tokenizes a string, returns a vector of tokens
{
bCaseSensitive;
// prologue
vector<token> vRet;
// tokenize input string
for( string::const_iterator itA = str.begin(), it=itA; it != str.end(); it = find_first_of(++it,str.end(),delims.begin(),delims.end()) )
{
// prologue
// find end of token
string::const_iterator itEnd = find_first_of(it+1,str.end(),delims.begin(),delims.end());
// add string to output
if( it == itA ) vRet.push_back(make_pair(0,string(it,itEnd)));
else vRet.push_back(make_pair(*it,string(it+1,itEnd)));
// epilogue
}
// epilogue
return vRet;
}
using namespace std;
int main()
{
string input = "some:word{or other";
typedef vector<token> tokens;
tokens toks = tokenize(input.c_str(), " :{");
cout << "Input: '" << input << " # Tokens: " << toks.size() << "'\n";
for( tokens::iterator it = toks.begin(); it != toks.end(); ++it )
{
cout << " Token : '" << it->second << "', Delimiter: '" << it->first << "'\n";
}
return 0;
}