Question

我正在尝试解析这样的文件：

while (true){
    print("hello world")
}

虽然这不是Python语法，但我使用python进行解析。我的代码是：

        words = []
        for line in lines: #line holds array of the above lines
            words += re.sub("[\s]", " ", line).split()

我的结果是：

['while', '(true){', 'print("hello', 'world")', '}']

这很酷，因为我只使用re和[\ s]正则表达式，但我怎么能得到这样的结果：

['while', '(', 'true', ')', '{'....]

我得到所有符号（假设我有一个接一个包含它们的字符串，例如符号=＆＃39;（）{}：，= + - ＆＃39;）？

Answer 1

您可以将re.split与群组一起使用以获取拆分文本和拆分字符。

例如，符号可以与r'\W+' RegEx匹配。

以下是一个例子：

import re

code = """\
while (true){
    print("hello world")
}
"""

for line in code.splitlines():
    print(re.split(r"(\W+)", line))

你会得到：

['', '    ', 'while', ' (', 'true', '){', '']
['', '        ', 'print', '("', 'hello', ' ', 'world', '")', '']
['', '    }', '']
['', '    ', '']

通过过滤，您可以删除空字符串...

或者，如果您需要匹配单字符符号，则可以：

for line in code.splitlines():
    tokens = [token for token in re.split(r"(\W)", line) if token.strip()]
    print(tokens)

你得到：

['while', '(', 'true', ')', '{']
['print', '(', '"', 'hello', 'world', '"', ')']
['}']
[]

Answer 2

试试这个：

import re

re1 = r'(.?)([(){}:,=+-]{1})(.?)'

lines = '''
while (true){
    print("hello world")
}
'''

words = []
for line in lines.split('\n'): #line holds array of the above lines
    cleanLine = re.sub(re1, '\g<1> \g<2> \g<3>', line)
    words += re.sub("[\s]", " ", cleanLine).split()}

print(words)
# ['while', '(', 'true', ')', '{', 'print', '(', '"hello', 'world"', ')', '}']

使用带有特殊符号Python的re解析一行

2 个答案: