解析逃脱角色

时间:2011-09-08 14:11:25

标签: python parsing

使用Python,我正在尝试解析这样的字符串:

"hello" "I am an example" "the man said:\"hello!\""

进入这些代币:

1) hello
2) I am an example
3) the man said: "hello!"

re.findall(r'"[^"]*"', str)之类的东西很接近,但无法处理转义字符(\)。我很好奇有什么样的pythonic方法可以处理escape char而不需要使用for循环或大型解析器包。

3 个答案:

答案 0 :(得分:5)

这非常适合正则表达式:

re.findall(r'"(?:\\.|[^"\\])*"', str)

<强>解释

"        # Match a "
(?:      # Match either...
 \\.     # an escaped character (\\, \" etc.)
|        # or
 [^"\\]  # any character except " or \
)*       # any number of times
"        # Match a "

这将正确处理转义的反斜杠:

>>> import re
>>> test = r'"hello" "Hello\\" "I am an example" "the man said:\"hello!\\\""'
>>> for match in re.findall(r'"(?:\\.|[^"\\])*"', test):
...     print(match)
...
"hello"
"Hello\\"
"I am an example"
"the man said:\"hello!\\\""

答案 1 :(得分:5)

您可以使用Python标记器:

import StringIO
s = r'"hello" "I am an example" "the man said:\"hello!\""'
sio = StringIO.StringIO(s)
t = list(tokenize.generate_tokens(sio.readline))
for tok in t: 
    print tok[1]

打印:

"hello"
"I am an example"
"the man said:\"hello!\""

这假设你真的想要字符串的Python语法。

答案 2 :(得分:-1)

您可以在re xing之前替换原始字符串中的'\'标记:

input_string = '"hello" "I am an example" "the man said:\"hello!\""'
input_string.replace('\', '')

编辑:我不确定你是否必须逃避'\'标志。也许你必须写:

input_string.replace('\\', '')