使用RE扫描仪查找双引号中的材料?

时间:2014-03-30 00:19:33

标签: python regex string double-quotes

有没有办法可以使用以下(未记录的)re.Scanner来查找双引号内的所有内容,以便将这样的匹配分类为字符串?

    scanner = re.Scanner([
(r"[-10-9]+", lambda scanner, token:("INTEGER", int(token))),
(r"[A-Za-z]+", lambda scanner, token:("NAME", str(token))),
(r"[:true::false:]+", lambda scanner, token:("BOOL", token)),
(r"[:error:]+", lambda scanner, token:("ERROR", token)),
(r'.', lambda scanner, token: None),
])

1 个答案:

答案 0 :(得分:1)

您可以像这样简单地将字符串正则表达式添加到扫描仪:

>>> import re
>>> scanner = re.Scanner([
(r"[-10-9]+", lambda scanner, token:("INTEGER", int(token))),
(r"[A-Za-z]+", lambda scanner, token:("NAME", str(token))),
(r"[:true::false:]+", lambda scanner, token:("BOOL", token)),
(r"[:error:]+", lambda scanner, token:("ERROR", token)),
(r'".*?"', lambda scanner, token:("STRING", token)),  # added STRING regex
(r'.', lambda scanner, token: None),
])

现在你可以测试一下:

>>> i = '"string"'  # simulated input
>>> t = '"this is a very long string with whitespace"'  # another simulated input
>>> scanner.scan(i)
([('STRING', '"string"')], '')  # ([(token_label, match)], remainder_of_string)
>>> scanner.scan(t)
([('STRING', '"this is a very long string with whitespace"')], '')