Python中字符流的Lisp tokenizer

时间:2017-08-31 07:33:20

标签: python parsing lisp tokenize

我喜欢在这里阅读Peter Norvig廉价而开朗的Lisp翻译:

http://norvig.com/lispy.html

在他的代码中,他使用这个非常简单的函数来标记输入的Lisp代码:

def tokenize(chars):
    "Convert a string of characters into a list of tokens."
    return chars.replace('(', ' ( ').replace(')', ' ) ').split()

我想将其重写为可以在流上运行的生成器,如下所示:

def tokenize(stream):
    "Generate a stream of tokens from a stream of characters."
    # Do something here
    yield token  # This would be wrapped in a loop

我勾勒出一台状态机并开始实施它,但很快变得比预期的更复杂。是否有一种我更缺失的简单方法?

1 个答案:

答案 0 :(得分:1)

我再次尝试并想出了这个。它尚未经过良好测试,但到目前为止似乎有效。

def tokenise(char_stream):
  c = char_stream.read(1)
  accumulated = []

  while c:
    c_isbracket = c in '()'
    if !c.isspace() and !c_isbracket:
      accumulated.append(c)
    else:
      if accumulated:
        token_str = ''.join(accumulated)
        accumulated = []
        yield token_str
      if c_isbracket:
        yield c

    c = char_stream.read(1)