我的python代码中有以下正则表达式,它真的很长。由于python是一种以空格分隔的语言,我该如何清理它呢?
matches = re.findall("((?:jan(?:(?:.)?|(?:uary)?)|feb(?:(?:.)?|(?:ruary)?)|mar(?:(?:.)?|(?:ch)?)|apr(?:(?:.)?|(?:il)?)|may|jun(?:(?:.)?|(?:e)?)|jul(?:(?:.)?|(?:y)?)|aug(?:(?:.)?|(?:gust)?)|sep(?:(?:.)?|(?:ept(?:(?:.)?))?|(?:tember)?)|oct(?:(?:.)?|(?:ober)?)|nov(?:(?:.)?|(?:ember)?)|dec(?:(?:.)?|(?:ember)?)) (?:[12][0-9]|[1-9]))",fileText,re.IGNORECASE)
非常感谢任何帮助。
答案 0 :(得分:3)
答案 1 :(得分:1)
我更喜欢写这样复杂的正则表达式:
r"""(?x)
....
"""
,其中
r
以原始文字开头,因此斜杠只会转义一次"""
开始多行文字(?x)
打开扩展(详细)模式:忽略空格,允许评论对于你的例子:
date = r"""(?xi)
(?: # this is a comment
jan (?: \.|uary)?
| feb (?: \.|ruary)?
| mar (?: \.|ch)?
| apr (?: \.|il)?
etc
)
(?: # well, how about 30, 31?
[12][0-9] | [1-9]
)
"""
(?xi)
之类的内联标记比re.XXX
更具可读性,因为它们与表达式本身绑定,属于它们。
答案 2 :(得分:0)
这是你想要的吗?
import re
regx = re.compile("("
"(?:"
"jan(?:\.|uary)"
"|"
"feb(?:\.|ruary)"
"|"
"mar(?:\.|ch)"
"|"
"apr(?:\.|il)"
"|"
"may"
"|"
"ju(?:n[.e]|l[.y])"
"|"
"aug(?:\.|ust)"
"|"
"sep(?:\.|tember)"
"|"
"oct(?:\.|ober)"
"|"
"(?:nov|dec)(?:\.|ember)"
")"
" (?:[12][0-9]|[1-9]|3[01])"
")",
re.IGNORECASE)
s = "ght july 24 tiren august 23 hyu jan. 11"
print regx.findall(s)
结果
['july 24', 'august 23', 'jan. 11']
在括号之间,圆点失去了它的特殊含义。