使用Python,我正在尝试解析这样的字符串:
"hello" "I am an example" "the man said:\"hello!\""
进入这些代币:
1) hello
2) I am an example
3) the man said: "hello!"
re.findall(r'"[^"]*"', str)
之类的东西很接近,但无法处理转义字符(\)。我很好奇有什么样的pythonic方法可以处理escape char而不需要使用for循环或大型解析器包。
答案 0 :(得分:5)
这非常适合正则表达式:
re.findall(r'"(?:\\.|[^"\\])*"', str)
<强>解释强>
" # Match a "
(?: # Match either...
\\. # an escaped character (\\, \" etc.)
| # or
[^"\\] # any character except " or \
)* # any number of times
" # Match a "
这将正确处理转义的反斜杠:
>>> import re
>>> test = r'"hello" "Hello\\" "I am an example" "the man said:\"hello!\\\""'
>>> for match in re.findall(r'"(?:\\.|[^"\\])*"', test):
... print(match)
...
"hello"
"Hello\\"
"I am an example"
"the man said:\"hello!\\\""
答案 1 :(得分:5)
您可以使用Python标记器:
import StringIO
s = r'"hello" "I am an example" "the man said:\"hello!\""'
sio = StringIO.StringIO(s)
t = list(tokenize.generate_tokens(sio.readline))
for tok in t:
print tok[1]
打印:
"hello"
"I am an example"
"the man said:\"hello!\""
这假设你真的想要字符串的Python语法。
答案 2 :(得分:-1)
您可以在re
xing之前替换原始字符串中的'\'标记:
input_string = '"hello" "I am an example" "the man said:\"hello!\""'
input_string.replace('\', '')
编辑:我不确定你是否必须逃避'\'标志。也许你必须写:
input_string.replace('\\', '')