我想在本地文件夹的xml文件中找出名称,并使用以下脚本:
import xml.etree.ElementTree as et
import os, glob, re
in_path = r'D:\B02'
out_path = r'D:\B02\summary.txt'
re_no = 'zi*?.xml'
re_m = 'zi.*?.xml'
def fetch_name(e):
for nam in e.findall('PDEheader'):
return nam.find('name').text
file_add = open(str(out_path), 'w')
for fileName in glob.glob(os.path.join(str(in_path), re_no)):
re_name=fetch_name(et.parse(fileName))
re_NO = re.search(re_m, fileName).group()
file_add.write('{}, {}\n'.format(re_NO, re_name))
file_add.close()
我可以使用glob脚本中的re_no=‘zi*?.xml’
搜索模式来获取xml文件地址,但是如果使用re_no=‘zi.*?.xml’
,则不能。
另一方面,我可以使用re_m=‘zi.*?.xml’
在re.search模式中找到xml文件名,但是不能使用re_m=‘zi*?.xml’
。您能解释一下区别吗?
答案 0 :(得分:2)
如果声明正则表达式模式,则应养成使用原始字符串的习惯:
re_no = 'zi*?.xml' # `z` followed by
# as few as possible `i` followed by
# one anything (see footmark) followed by
# `xml`
re_m = 'zi.*?\.xml' # `zi` followed by
# as few as possible anythings (see footmark) followed by
# `.xml` - the . here is literal, not an anything
re_no = 'zi.*?\\.xml' # is similar to above - but you look for
# `zi` followed by
# as few as possible anythings (see footmark) followed by
# literal `\` followed by
# anything followed by 'xml'
使用
re_m = r'zi.*?\.xml'
并使用http://regex101.com(切换至python模式)来解释您的正则表达式(在站点文字中)并进行测试(针对您提供的testdata):example for that
足迹:
.
所表示的任何内容均不涵盖f.e.换行符,除非您指定某些标志-阅读@ re-doku