有没有办法可以使用以下(未记录的)re.Scanner
来查找双引号内的所有内容,以便将这样的匹配分类为字符串?
scanner = re.Scanner([
(r"[-10-9]+", lambda scanner, token:("INTEGER", int(token))),
(r"[A-Za-z]+", lambda scanner, token:("NAME", str(token))),
(r"[:true::false:]+", lambda scanner, token:("BOOL", token)),
(r"[:error:]+", lambda scanner, token:("ERROR", token)),
(r'.', lambda scanner, token: None),
])
答案 0 :(得分:1)
您可以像这样简单地将字符串正则表达式添加到扫描仪:
>>> import re
>>> scanner = re.Scanner([
(r"[-10-9]+", lambda scanner, token:("INTEGER", int(token))),
(r"[A-Za-z]+", lambda scanner, token:("NAME", str(token))),
(r"[:true::false:]+", lambda scanner, token:("BOOL", token)),
(r"[:error:]+", lambda scanner, token:("ERROR", token)),
(r'".*?"', lambda scanner, token:("STRING", token)), # added STRING regex
(r'.', lambda scanner, token: None),
])
现在你可以测试一下:
>>> i = '"string"' # simulated input
>>> t = '"this is a very long string with whitespace"' # another simulated input
>>> scanner.scan(i)
([('STRING', '"string"')], '') # ([(token_label, match)], remainder_of_string)
>>> scanner.scan(t)
([('STRING', '"this is a very long string with whitespace"')], '')