Question

我想在本地文件夹的xml文件中找出名称，并使用以下脚本：

import xml.etree.ElementTree as et
import os, glob, re    
in_path = r'D:\B02'
out_path = r'D:\B02\summary.txt'
re_no = 'zi*?.xml'
re_m = 'zi.*?.xml'
def fetch_name(e):
    for nam in e.findall('PDEheader'):
        return nam.find('name').text
file_add = open(str(out_path), 'w')
for fileName in glob.glob(os.path.join(str(in_path), re_no)):
    re_name=fetch_name(et.parse(fileName))
    re_NO = re.search(re_m, fileName).group()
    file_add.write('{}, {}\n'.format(re_NO, re_name))   
file_add.close()

我可以使用glob脚本中的re_no=‘zi*?.xml’搜索模式来获取xml文件地址，但是如果使用re_no=‘zi.*?.xml’，则不能。

另一方面，我可以使用re_m=‘zi.*?.xml’在re.search模式中找到xml文件名，但是不能使用re_m=‘zi*?.xml’。您能解释一下区别吗？

Answer 1

如果声明正则表达式模式，则应养成使用原始字符串的习惯：

re_no = 'zi*?.xml'     # `z` followed by 
                       # as few as possible `i` followed by 
                       # one anything (see footmark) followed by 
                       # `xml`

re_m = 'zi.*?\.xml'    # `zi` followed by 
                       # as few as possible anythings (see footmark) followed by 
                       # `.xml`   - the . here is literal, not an anything  

re_no = 'zi.*?\\.xml'  # is similar to above - but you look for
                       # `zi` followed by 
                       # as few as possible anythings (see footmark) followed by 
                       # literal `\` followed by
                       # anything followed by 'xml'

使用

re_m = r'zi.*?\.xml'

并使用http://regex101.com（切换至python模式）来解释您的正则表达式（在站点文字中）并进行测试（针对您提供的testdata）：example for that

足迹：

.所表示的任何内容均不涵盖f.e.换行符，除非您指定某些标志-阅读@ re-doku

Python正则表达式模式拼图

1 个答案: