具有替换变量的HTML的简单ParseKit语法

时间:2012-02-15 18:28:41

标签: iphone ios parsekit

对于iOS应用程序,我想解析一个可能包含UNIX样式变量的HTML文件以进行替换。例如,HTML可能如下所示:

<html>
  <head></head>
  <body>
    <h1>${title}</h1>
    <p>${paragraph1}</p>
    <img src="${image}" />
  </body>
</html>

我正在尝试创建一个简单的ParseKit语法,它将为我提供两个回调:一个用于直通HTML,另一个用于它检测到的变量。为此,我创建了以下语法:

@start        = Empty | content*;

content       = variable | passThrough;
passThrough   = /[^$]+/;
variable      = '$' '{' Word closeChar;

openChar      = '${';
closeChar     = '}';

我至少面临两个问题:对于variable我最初宣称它为openChar Word closeChar,但它不起作用(我仍然不知道为什么)。第二个问题(更重要的是)解析器在找到<img src"${image}" />时停止(即引用字符串中的变量)。

我的问题是:

  1. 如何修改语法以使其按预期工作?
  2. 使用标记器更好吗?如果是这种情况,我该如何配置呢?

1 个答案:

答案 0 :(得分:4)

ParseKit的开发人员。我会回答你的两个问题:

1)你采取了正确的方法,但这是一个棘手的案例。有几个小问题,你的语法需要改变一点。

我开发了一种适合我的语法:

// Tokenizer Directives
@symbolState = '"' "'"; // effectively tells the tokenizer to turn off QuoteState. 
                      // Otherwise, variables enclosed in quotes would not be found (they'd be embedded in quoted strings). 
                      // now single- & double-quotes will be recognized as individual symbols, not start- & end-markers for quoted strings

@symbols = '${'; // declare '${' as a multi-char symbol

@reportsWhitespaceTokens = YES; // tell the tokenizer to preserve/report whitespace

// Grammar
@start = content*;
content = passthru | variable;
passthru = /[^$].*/;
variable = start name end;
start = '${';
end = '}';
name = Word;

然后在汇编程序中实现这两个回调:

- (void)parser:(PKParser *)p didMatchName:(PKAssembly *)a {
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a);
    PKToken *tok = [a pop];

    NSString *name = tok.stringValue;
    // do something with name
}

- (void)parser:(PKParser *)p didMatchPassthru:(PKAssembly *)a {
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a);
    PKToken *tok = [a pop];

    NSMutableString *s = a.target;
    if (!s) {
        s = [NSMutableString string];
    }

    [s appendString:tok.stringValue];

    a.target = s;
}

然后您的客户端/驱动程序代码将如下所示:

NSString *g = // fetch grammar
PKParser *p = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
NSString *s = @"<img src=\"${image}\" />";
[p parse:s];
NSString *result = [p parse:s];
NSLog(@"result %@", result);

将打印出来:

result: <img src="" />

2)是的,我认为在这个相对简单的情况下直接使用Tokenizer肯定会好得多。性能将大大提高。以下是使用Tokenizer处理任务的方法:

PKTokenizer *t = [PKTokenizer tokenizerWithString:s];
[t setTokenizerState:t.symbolState from:'"' to:'"'];
[t setTokenizerState:t.symbolState from:'\'' to:'\''];
[t.symbolState add:@"${"];
t.whitespaceState.reportsWhitespaceTokens = YES;

NSMutableString *result = [NSMutableString string];

PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;
while (eof != (tok = [t nextToken])) {
    if ([@"${" isEqualToString:tok.stringValue]) {
        tok = [t nextToken];
        NSString *varName = tok.stringValue;

        // do something with variable
    } else if ([@"}" isEqualToString:tok.stringValue]) {
        // do nothing
    } else {
        [result appendString:tok.stringValue];
    }
}