Question

我试图消除文本文件中的注释，空行和额外空格，然后将剩余的元素标记化。每个令牌前后都需要一个空格。

exampleFile.txt
var

/* declare variables */a1 ,
b2a ,     c,

这是现在的工作，

string line; //line: represents one line of text from file
ifstream InputFile("exampleFile", ios::in); //read from exampleFile.txt

//Remove comments
while (InputFile && getline(InputFile, line, '\0'))
{
    while (line.find("/*") != string::npos)
    {
        size_t Begin = line.find("/*");
        line.erase(Begin, (line.find("*/", Begin) - Begin) + 2);
        // Start at Begin, erase from Begin to where */ is found
    }   
}

这会删除评论，但在发生这种情况时，我似乎无法想出一种标记化的方法。

所以我的问题是：

是否可以删除注释，空格和空行并在此while语句中标记所有内容？
如何实现一个函数，在每个令牌被标记化之前在它们之间添加空格？像c这样的标记需要被单独识别为c和c。

先谢谢你的帮助！

Answer 1

如果您需要跳过空白字符并且不关心新行，那么我建议您使用operator>>来阅读该文件。你可以简单地写一下：

std::string word;
bool isComment = false;
while(file >> word)
{
    if (isInsideComment(word, isComment))
        continue;

     // do processing of the tokens here
     std::cout << word << std::endl;
}

辅助函数的实现方式如下：

bool isInsideComment(std::string &word, bool &isComment)
{
    const std::string tagStart = "/*";
    const std::string tagStop = "*/";

    // match start marker
    if (std::equal(tagStart.rbegin(), tagStart.rend(), word.rbegin())) // ends with tagStart
    {
        isComment = true;
        if (word == tagStart)
            return true;

        word = word.substr(0, word.find(tagStart));
        return false;
    }

    // match end marker
    if (isComment)
    {
        if (std::equal(tagStop.begin(), tagStop.end(), word.begin())) // starts with tagStop
        {
            isComment = false;
            word = word.substr(tagStop.size());
            return false;
        }

        return true;
    }

    return false;
}

对于你的例子，这将打印出来：

var
a1
,
b2a
,
c,

如果您感兴趣，上述逻辑还应处理多行评论。

但是，表示应根据您对注释令牌的假设修改函数实现。例如，它们是否总是与其他words的空格分开？或者是否可能会解析var1/*comment*/var2表达式？上面的例子在这种情况下不会起作用。

因此，另一种选择是（您已经开始实施的）从文件中读取行甚至数据块（以确保匹配开始和结束注释标记）以及使用find或{{3来学习注释标记的位置之后删除它们。

通过删除C ++中的注释，额外空格和空行来对文本文件中的元素进行标记

1 个答案: