我已编写此代码以使用正则表达式删除字符串,这可以进行优化吗?

时间:2016-12-28 07:12:15

标签: regex python-2.7

输入

Office_Employee_19981128_Temp32 

代码

string = 'Office_Employee_19981128_Temp32 '

new_string = re.sub('^Office_Employee_','',string)

new_string_2 = re.sub('\_Temp32$','',new_string)

print new_string_2

输出

19981128

1 个答案:

答案 0 :(得分:2)

使用re.search()函数提取所需的数字:

string = 'Office_Employee_19981128_Temp32 '
matches = re.search(r'(?<=_)\d+(?=_)', string)
result = matches.group(0) if matches else ''

print result

输出:

19981128

编辑模式_(\d+)_可用作替代

matches = re.search(r'_(\d+)_', 'Office_Employee_19981128_Temp32 ')
result = matches.group(1) if matches else ''
print result # will give same result as first approach

执行时间测量:

python3 -m timeit -n 1000  -s  "import re;matches = re.search(r'(?<=_)\d+(?=_)', 'Office_Employee_19981128_Temp32 '); result = matches.group(0) if matches else ''"
1000 loops, best of 3: 0.0147 usec per loop
python3 -m timeit -n 1000 -s  "import re;matches = re.search(r'_(\d+)_', 'Office_Employee_19981128_Temp32 '); result = matches.group(1) if matches else ''"
1000 loops, best of 3: 0.0148 usec per loop

python3 -m timeit -s  "import re;matches = re.search(r'(?<=_)\d+(?=_)', 'Office_Employee_19981128_Temp32 '); result = matches.group(0) if matches else ''"
100000000 loops, best of 3: 0.00708 usec per loop
python3 -m timeit -s  "import re;matches = re.search(r'_(\d+)_', 'Office_Employee_19981128_Temp32 '); result = matches.group(1) if matches else ''"
100000000 loops, best of 3: 0.00717 usec per loop