Question

我正在尝试为以下内容编写正则表达式：

“任意数量的字符，任何字符直到'ch01'或'chapter01'下一个字符不能是一个数字，然后是任意数量的字符，任何字符直到一个句点并且在句点之后必须有一个html或xhtml “

很抱歉，如果这令人困惑，但有些测试用例可能会有更好的帮助：

x = 'fdsafafsdch01fdsfdsf.xhtml' #pass
y = '9781599048970_ch01__s1_002.html' #pass
z = 'ch01.html' #pass
a = 'chapter019.xhtml' #fail
l = 'chapter01.html' #pass
m = 'chapter010-fn.xhtml' #fail
matches = [x, y, z, a, l, m]

for item in matches:
  print(bool(re.search('ch(apter)?01\D?.*\.x?html',  item)))

（＃fail == False，＃pass == True）

目前所有案例都返回True

Answer 1

问题似乎与\D?有关。这意味着＆＃34;零或一个非数字＆＃34;，因此正则表达式将chapter019解析为＆＃34; chapter01＆＃34;其次是零非数字，后跟一个字符，它很乐意匹配。尝试让?影响其后的\D和.*。

for item in matches:
  print(bool(re.search('ch(apter)?01(\D.*)?\.x?html',  item)))

结果：

True
True
True
False
True
False

html / xhtml文件名的正则表达式

1 个答案: