如何避免使用python正则表达式的特殊字符?

时间:2017-11-06 12:46:56

标签: python regex python-2.7

我想从以下字符串中提取变量(即由''包围的名称)

情况1:

string = r"RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="No"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))"

申请时

all_variables = list(set(re.findall("'([^']*)'", string)))

我得到了正确的结果:

all_variables = ['Loss_Ratio','POL_Zero','POL_children']

但案例2(当 POL_Zero 模态发生变化时)

string = r"RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="Nos' conditional"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))"

同样的正则表达式会产生错误的结果。我怎样才能在case2中获得正确的结果?

注意,名称中不能有单引号或双引号。

1 个答案:

答案 0 :(得分:1)

您可以利用单引号字符串既不能包含单引号也不能包含双引号的事实。

只有这种情况,

"""'([^"']*)'"""

正则表达式将按预期工作。请参阅the regex demo

在这里,

  • ' - 匹配单引号
  • ([^"']*) - 第1组(如果您使用的是re.findall', only this part will be present in the output): zero or more ( * ) chars other thanand'( [^'”]`)
  • ' - 关闭单引号。

Python demo

import re
s = """RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="No"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))

RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="Nos' conditional"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))"""
print(re.findall(r"""'([^"']*)'""", s))
# => ['Loss_Ratio', 'POL_Zero', 'POL_children', 'Loss_Ratio', 'POL_Zero', 'POL_children']