c中的智能标记器

时间:2012-01-05 19:09:36

标签: c tokenize

我必须用c / c ++编写一个tokenizer,这样我就必须解析一个表单的字符串

char pSignature[] = "work.\\top =>\\p1 =:5:p2=:10:=>interface_ports:=dut";

并填充一对\ p1 5和p2 10等。有人建议我任何好的方法。使用strtok的问题是如何在=> interface_ports发生之前结束它。 以下是我写的代码:

int main() {
  char pSignature[] = "work.\\top =>\\p1 =:5:p2=:10:=>interface_ports:=dut";
  char* mParamName = NULL;
  char* mParamVal = NULL;
  char* sTemp = pSignature;
  bool bIsLibState = true;
  bool bIsModState = false;
  bool bIsEscaped = false;
  while (*sTemp != '\0') {
    // Extract library ..
    if (bIsLibState) {
      if (*sTemp == '.') {
        bIsLibState = false;
        bIsModState = true;
      }
      sTemp++;
    }
    else if (bIsModState) {
    // Extract moduleName..
      if (*sTemp == '\\') {
        bIsEscaped = true;
      }
      if (bIsEscaped) {
        if (*sTemp == ' ') {
          bIsModState = false;
          bIsEscaped = false;
          sTemp++;
          sTemp += 2;
          break;
        }
        else 
          sTemp++;
      }
      else {
        if (*(sTemp+1) == '=' && *(sTemp+2) == '>') {
          bIsModState = false;
          sTemp++;
          sTemp += 2;
          break;
        }
        else
          sTemp++;
      }
    }
  }

  char* tmp = sTemp;
  char* mStr = sTemp;
  bool bEscaped = false;
  while(tmp != NULL)
  {
    if (*tmp == '\\') {
      tmp = strtok(mStr, " ");
        bEscaped  = true;
    }
    else
      tmp = strtok(mStr, "=:");
    if (!strcmp(tmp,">interface_ports"))
      break;
    mStr = NULL;
    mParamName = tmp;

    tmp = strtok(mStr, "=:");
    if (!strcmp(tmp,">interface_ports"))
      break;
    mParamVal = tmp;
    cout << mParamName <<"  " << mParamVal << endl;
    //if (mParamName && mParamVal) {
    //  symCharPair* paramPair = new symCharPair(VeIntern(mParamName), mParamVal);
    //  pParamValueList->AddTail(paramPair);
    //}
  }
return 0;
}

2 个答案:

答案 0 :(得分:1)

如果您的输入字符串始终为此格式

work.\\top =>\\p1 =:5:p2=:10:=>interface_ports:=dut

然后你可以做一些更简单的事情:

 #include <string.h>

 const char *input = "work.\\top =>\\p1 =:5:p2=:10:=>interface_ports:=dut";

 // find first occurrence of "=>"
 const char *start = strstr(input, "=>");

 // find first occurrence of ":=>"
 const char *end= strstr(input, ":=>");

 if (start == NULL || end == NULL)
     exit(-1);

 int length = end - start - 2 ; // the -2 is to skip the "=>"
 char *pairs = malloc(length + 1); // +1 for the terminating \0
 strncpy(pairs, start + 2, length);

现在pairs应该包含您可能更容易处理的\p1 =:5:p2=:10

答案 1 :(得分:0)

阅读有关booksparsing技术的lexing,特别是在编译中使用。

了解如何制作递归下降解析器,并使用ANTLRbison(或yacc),flex

等生成器