Question

使用Python，我正在尝试解析这样的字符串：

"hello" "I am an example" "the man said:\"hello!\""

进入这些代币：

1) hello
2) I am an example
3) the man said: "hello!"

re.findall(r'"[^"]*"', str)之类的东西很接近，但无法处理转义字符（\）。我很好奇有什么样的pythonic方法可以处理escape char而不需要使用for循环或大型解析器包。

Answer 1

这非常适合正则表达式：

re.findall(r'"(?:\\.|[^"\\])*"', str)

<强>解释

"        # Match a "
(?:      # Match either...
 \\.     # an escaped character (\\, \" etc.)
|        # or
 [^"\\]  # any character except " or \
)*       # any number of times
"        # Match a "

这将正确处理转义的反斜杠：

>>> import re
>>> test = r'"hello" "Hello\\" "I am an example" "the man said:\"hello!\\\""'
>>> for match in re.findall(r'"(?:\\.|[^"\\])*"', test):
...     print(match)
...
"hello"
"Hello\\"
"I am an example"
"the man said:\"hello!\\\""

Answer 2

您可以使用Python标记器：

import StringIO
s = r'"hello" "I am an example" "the man said:\"hello!\""'
sio = StringIO.StringIO(s)
t = list(tokenize.generate_tokens(sio.readline))
for tok in t: 
    print tok[1]

打印：

"hello"
"I am an example"
"the man said:\"hello!\""

这假设你真的想要字符串的Python语法。

Answer 3

您可以在re xing之前替换原始字符串中的'\'标记：

input_string = '"hello" "I am an example" "the man said:\"hello!\""'
input_string.replace('\', '')

编辑：我不确定你是否必须逃避'\'标志。也许你必须写：

input_string.replace('\\', '')

解析逃脱角色

3 个答案: