Question

我正在编写一个文本解析器，需要能够从行中删除注释。我使用的是一种相当简单的语言，其中所有注释都是由＃字符启动的，之后删除所有内容都很简单，但我必须处理＃在字符串内部的可能性。

因此，我的问题是，给出了诸如
的字符串 Value="String#1";"String#2"; # This is an array of "-delimited strings, "Like this"
我如何才能最好地提取子字符串
Value="String#1";"String#2";（注意尾随空格）

请注意，评论可能包含引号，而且，整行可以选择＆＃34;和＆＃39;尽管它会在整个生产线上保持一致。如果重要的话，这是事先已知的。字符串中的引号将由\

转义

Answer 1

std::string stripComment(std::string str) {
    bool escaped = false;
    bool inSingleQuote = false;
    bool inDoubleQuote = false;
    for(std::string::const_iterator it = str.begin(); it != str.end(); it++) {
         if(escaped) {
             escaped = false;
         } else if(*it == '\\' && (inSingleQuote || inDoubleQuote)) {
             escaped = true;
         } else if(inSingleQuote) {
             if(*it == '\'') {
                 inSingleQuote = false;
             }
         } else if(inDoubleQuote) {
             if(*it == '"') {
                 inDoubleQuote = false;
             }
         } else if(*it == '\'') {
             inSingleQuote = true;
         } else if(*it == '"') {
             inDoubleQuote = true;
         } else if(*it == '#') {
             return std::string(str.begin(), it);
         }
    }
    return str;
}

编辑：或者更多的教科书FSM，

std::string stripComment(std::string str) {
    int states[5][4] = {
    //      \  '  "
        {0, 0, 1, 2,}
        {1, 3, 0, 1,},  //single quoted string
        {2, 4, 2, 0,},  //double quoted string
        {1, 1, 1, 1,},  //escape in single quoted string
        {2, 2, 2, 2,},  //escape in double quoted string
    };
    int state = 0;
    for(std::string::const_iterator it = str.begin(); it != str.end(); it++) {
        switch(*it) {
            case '\\':
                state = states[state][1];
                break;
            case '\'':
                state = states[state][2];
                break;
            case '"':
                state = states[state][3];
                break;
            case '#':
                if(!state) {
                    return std::string(str.begin(), it);
                }
            default:
                state = states[state][0];
        }          
    }
    return str;
}

states数组定义了FSM状态之间的转换。

第一个索引是当前状态，0，1，2，3或4。

第二个索引对应于字符\，'，"或其他字符。

数组根据当前状态和字符告诉下一个状态。

仅供参考，这些假设反斜杠会转义字符串中的任何字符。你至少需要它们来逃避反斜杠，所以你可以得到一个以反斜杠结尾的字符串。

从字符串中删除行注释

1 个答案: