Python3.0 - tokenize和untokenize

时间:2009-06-01 12:41:22

标签: python python-3.x tokenize lexical-analysis

我正在使用类似于以下简化脚本的东西来从更大的文件中解析python的片段:

import io
import tokenize

src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)

src = list(tokenize.tokenize(src.readline))

for tok in src:
  print(tok)

src = tokenize.untokenize(src)

尽管python2.x中的代码不一样,但它使用相同的习惯用法并且工作得很好。但是,使用python3.0运行上面的代码片段,我得到了这个输出:

(57, 'utf-8', (0, 0), (0, 0), '')
(1, 'foo', (1, 0), (1, 3), 'foo="bar"')
(53, '=', (1, 3), (1, 4), 'foo="bar"')
(3, '"bar"', (1, 4), (1, 9), 'foo="bar"')
(0, '', (2, 0), (2, 0), '')

Traceback (most recent call last):
  File "q.py", line 13, in <module>
    src = tokenize.untokenize(src)
  File "/usr/local/lib/python3.0/tokenize.py", line 236, in untokenize
    out = ut.untokenize(iterable)
  File "/usr/local/lib/python3.0/tokenize.py", line 165, in untokenize
    self.add_whitespace(start)
  File "/usr/local/lib/python3.0/tokenize.py", line 151, in add_whitespace
    assert row <= self.prev_row
AssertionError

我搜索过此错误及其原因的引用,但一直无法找到。我做错了什么,如何纠正?

[编辑]

partisann观察到在源上附加换行符导致错误消失后,我开始搞乱我无法识别的列表。似乎EOF令牌如果没有紧接在换行符之前会导致错误,因此删除它会消除错误。以下脚本运行时没有错误:

import io
import tokenize

src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)

src = list(tokenize.tokenize(src.readline))

for tok in src:
  print(tok)

src = tokenize.untokenize(src[:-1])

2 个答案:

答案 0 :(得分:3)

src = 'foo="bar"\n'
你忘了换行了。

答案 1 :(得分:0)

如果您将untokenize的输入限制为令牌的前2项,它似乎可以正常工作。

import io
import tokenize

src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)

src = list(tokenize.tokenize(src.readline))

for tok in src:
  print(tok)

src = [t[:2] for t in src]
src = tokenize.untokenize(src)