python regex替换所有以“:”开头的单词的出现,下一个字符为字母

时间:2019-02-25 08:58:45

标签: python regex

我有一个sql select(sel =“ sql select text”),其中变量定义为:var_name

select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=:channel
            and session_start between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
            and ( domain_name in (:domain) or 'All domains' in (:domain) )
            and (technology in (:technology) or 'All' in (:technology))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (:application) 
            or 'All' in (:application) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=:step_type and ...

变量以“:”开头,并以括号或空格结尾。 我必须用$ {var_name}替换每个:var_name。

目前,我正在使用: re.sub(r“:(\ w +)”,r“ $ {\ 1}”,sel)给出:

select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10${11}${12}') and session_module=${channel}
            and session_start between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
            and ( domain_name in (${domain}) or 'All domains' in (${domain}) )
            and (technology in (${technology}) or 'All' in (${technology}))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (${application}) 
            or 'All' in (${application}) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=${step_type} and ...

除了日期常量2019-01-01 10:11:12之外,其他所有方法都运行良好。由于其中包含“:”字符,其余字符将被识别为变量名。

仅当“:”字符后的下一个字符为字母时,才应替换。

该怎么做?

3 个答案:

答案 0 :(得分:1)

您可以使用此正则表达式,它使用积极的前瞻性来确保仅选择空格或)后跟的变量

:(\w+)(?=[ )\n]|$)

Demo

查看此Python代码,

import re

s = '''select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=:channel
            and session_start between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
            and ( domain_name in (:domain) or 'All domains' in (:domain) )
            and (technology in (:technology) or 'All' in (:technology))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (:application) 
            or 'All' in (:application) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=:step_type and ...:DatumOd
:DatumOd'''

print(re.sub(r':(\w+)(?=[ )\n]|$)', r'${\1}',s))

仅打印预期的变量,忽略日期中的冒号

select  ... from (
select  ... from (
select  ... from  table1
where session_started between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=${channel}
            and session_start between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
            and ( domain_name in (${domain}) or 'All domains' in (${domain}) )
            and (technology in (${technology}) or 'All' in (${technology}))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (${application})
            or 'All' in (${application}) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=${step_type} and ...${DatumOd}
${DatumOd}

答案 1 :(得分:1)

您可以尝试以下模式:'\W:(\w+)',以便仅在冒号不跟随 word 字符的情况下选择冒号后的某物。它适用于该示例,但我不确定是否足以满足一般要求。

答案 2 :(得分:0)

根据您的要求,您可以使用

s = re.sub(r'\B:([^\W\d_]\w*)', r'${\1}', s)

请参见regex demo

详细信息

  • \B:-一个:,其前面没有单词char(或位于字符串开头)
  • ([^\W\d_]\w*)-组1(替换模式中为\1):
    • [^\W\d_]-任何字母
    • \w*-任意0+个字母,数字,下划线。

注意:如果要只匹配ASCII字母和数字,并且使用的是Python 3.x,请使用re.Are.ASCII标志:

s = re.sub(r'\B:([^\W\d_]\w*)', r'${\1}', s, flags=re.A)

Python demo

import re
s = "select  ... from (\r\nselect  ... from ( \r\nselect  ... from  table1\r\nwhere session_started between  toDateTime(:DatumOd) and toDateTime(:DatumDo)\r\nand session_id in (select distinct ...  from table2\r\n    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=:channel\r\n            and session_start between  toDateTime(:DatumOd) and toDateTime(:DatumDo)\r\n            and ( domain_name in (:domain) or 'All domains' in (:domain) )\r\n            and (technology in (:technology) or 'All' in (:technology))\r\n            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (:application) \r\n            or 'All' in (:application) )  )\r\norder by session_id desc , execution_id desc, step_started desc, step_id desc)\r\n) where step_type=:step_type and ..."
s = re.sub(r'\B:([^\W\d_]\w*)', r'${\1}', s, flags=re.A)
print(s)

输出:

select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=${channel}
            and session_start between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
            and ( domain_name in (${domain}) or 'All domains' in (${domain}) )
            and (technology in (${technology}) or 'All' in (${technology}))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (${application}) 
            or 'All' in (${application}) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=${step_type} and ...