Python正则表达式模式拼图

时间:2018-11-15 08:40:19

标签: python regex

我想在本地文件夹的xml文件中找出名称,并使用以下脚本:

import xml.etree.ElementTree as et
import os, glob, re    
in_path = r'D:\B02'
out_path = r'D:\B02\summary.txt'
re_no = 'zi*?.xml'
re_m = 'zi.*?.xml'
def fetch_name(e):
    for nam in e.findall('PDEheader'):
        return nam.find('name').text
file_add = open(str(out_path), 'w')
for fileName in glob.glob(os.path.join(str(in_path), re_no)):
    re_name=fetch_name(et.parse(fileName))
    re_NO = re.search(re_m, fileName).group()
    file_add.write('{}, {}\n'.format(re_NO, re_name))   
file_add.close()

我可以使用glob脚本中的re_no=‘zi*?.xml’搜索模式来获取xml文件地址,但是如果使用re_no=‘zi.*?.xml’,则不能。

另一方面,我可以使用re_m=‘zi.*?.xml’在re.search模式中找到xml文件名,但是不能使用re_m=‘zi*?.xml’。您能解释一下区别吗?

1 个答案:

答案 0 :(得分:2)

如果声明正则表达式模式,则应养成使用原始字符串的习惯:

re_no = 'zi*?.xml'     # `z` followed by 
                       # as few as possible `i` followed by 
                       # one anything (see footmark) followed by 
                       # `xml`

re_m = 'zi.*?\.xml'    # `zi` followed by 
                       # as few as possible anythings (see footmark) followed by 
                       # `.xml`   - the . here is literal, not an anything  

re_no = 'zi.*?\\.xml'  # is similar to above - but you look for
                       # `zi` followed by 
                       # as few as possible anythings (see footmark) followed by 
                       # literal `\` followed by
                       # anything followed by 'xml'

使用

re_m = r'zi.*?\.xml'

并使用http://regex101.com(切换至python模式)来解释您的正则表达式(在站点文字中)并进行测试(针对您提供的testdata):example for that


足迹

.所表示的任何内容均不涵盖f.e.换行符,除非您指定某些标志-阅读@ re-doku