我在Perl中有一个程序,它使用正则表达式来存储支持的文件扩展名。它通过代码重用这个正则表达式。每个文件扩展名都有一个描述,因为正则表达式具有' x'旗。我无法弄清楚如何将它移植到python(2.7)。
use strict;
my @files = ('foo.abc','foo.ABC','foo.mi','foo.txt','foo.ma','foo.iff','foo.avi');
my $exts = qr/abc|mi|avi|ma|iff|tga/;
foreach my $f (sort @files) {
if ($f =~ m/^([^.]+\.$exts)/) {
print "file matches: $f\n";
}
else {
print "file does not match: $f\n";
}
}
file does not match: foo.ABC
file matches: foo.abc
file matches: foo.avi
file matches: foo.iff
file matches: foo.ma
file matches: foo.mi
file does not match: foo.txt
当我使用/x
修饰符
$exts = qr/
abc (?# alembic )
|mi (?# mentalray )
|avi (?# windows video )
|ma (?# maya ascii )
|iff (?# amiga bitmap )
|tga (?# targa bitmap )
/ix;
foreach my $f (sort @files) {
if ( $f =~ m/^([^.]+\.$exts )/ ) {
print "file matches: $f\n";
}
else {
print "file does not match: $f\n";
}
}
file matches: foo.ABC
file matches: foo.abc
file matches: foo.avi
file matches: foo.iff
file matches: foo.ma
file matches: foo.mi
file does not match: foo.txt
Python支持编译的正则表达式,您可以将它们用作其他正则表达式的组件
import re
files = [ 'foo.abc','foo.ABC','foo.mi','foo.txt','foo.ma','foo.iff','foo.avi' ]
exts = re.compile(r'(?:abc|mi|avi|ma|iff|tga)')
for f in sorted(files):
m = re.search(r'^([^.]+\.{EXTS})'.format(EXTS=exts.pattern),f)
if m:
print 'file matches: {0}'.format(f)
else:
print 'file does not match: {0}'.format(f)
file does not match: foo.ABC
file matches: foo.abc
file matches: foo.avi
file matches: foo.iff
file matches: foo.ma
file matches: foo.mi
file does not match: foo.txt
'''
但是一旦我使用re.VERBOSE
,正则表达式就会失败
exts = re.compile(r'''(?:
abc # alembic
|mi # mentalray
|avi # windows video
|ma # maya ascii
|iff # amiga bitmap
|tga # targa bitmap
)''', re.IGNORECASE + re.VERBOSE)
for f in sorted(files):
m = re.search(r'^([^.]+\.{EXTS})'.format(EXTS=exts.pattern),f)
if m:
print 'file matches: {0}'.format(f)
else:
print 'file does not match: {0}'.format(f)
file does not match: foo.ABC
file does not match: foo.abc
file does not match: foo.avi
file does not match: foo.iff
file does not match: foo.ma
file does not match: foo.mi
file does not match: foo.txt
我的实际代码有超过50个扩展,有关于它们的内容的评论,所以我真的想支持这个。
我搜索了所有"嵌套的正则表达式"我能找到的帖子,但所有这些都是字符串黑客。没有我能找到的实际正则表达式嵌套。
Python能做到吗?
答案 0 :(得分:4)
你这样做完全错了。首先,.pattern
属性只是一个字符串。所以它是100%无用的调用re.compile
然后提取用于获取正则表达式对象的初始字符串以传递给re.search
:
>>> regex = re.compile(r'''(
... verbose #lol
... | pattern #rofl
... )
... ''', re.VERBOSE)
>>> regex.match('verbose') # finds the match!
<_sre.SRE_Match object; span=(0, 7), match='verbose'>
>>> re.search(regex.pattern, 'verbose') # does not find the match!
>>>
正如您所看到的,pattern
属性只是用于构建正则表达式对象的初始字符串:
>>> regex.pattern
'(\n verbose #lol\n | pattern #rofl\n)\n'
>>> type(regex.pattern)
<class 'str'>
因此,通过将其传递到re.search
,您可以re.search
重新编译,因为re.search
没有re.VERBOSE
标记它用不同的含义编译它:
>>> re.search(regex.pattern, 'verbose', re.VERBOSE)
<_sre.SRE_Match object; span=(0, 7), match='verbose'>
另外,我这样做了:
exts = [
'abc', # extension abc blah blah
'cde', # extension cde blah blah
]
exts_pattern = '(?:{})'.format('|'.join(re.escape(extension) for extension in exts))
regex = re.compile(r'^([^.]+\.{}'.format(exts_pattern), re.IGNORECASE)
或类似的。即你将各种扩展保持为list
并放置你想要的任何python注释,当你使用compile
构建正则表达式对象时,你会迭代它们。这样可以更轻松地添加扩展名,而且无论如何都可以使用这样的列表。
并回答你的最后一个问题:没有python re
模块不支持&#34;正则表达式嵌套&#34;以任何方式。您必须提供字符串模式,该模式将编译为正则表达式对象。
答案 1 :(得分:1)
Perl会将一个已编译的正则表达式插入到另一个中,这是一个神话。如果你写这个
log4j:WARN No appenders could be found for logger (org.apache.flink.api.scala.ClosureCleaner$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.RuntimeException: An error occurred while loading the local executor (org.apache.flink.client.LocalExecutor).
at org.apache.flink.api.common.PlanExecutor.createLocalExecutor(PlanExecutor.java:161)
at org.apache.flink.api.java.LocalEnvironment.startNewSession(LocalEnvironment.java:122)
at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:81)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:855)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
at org.apache.flink.api.scala.DataSet.print(DataSet.scala:1615)
at com.sc.edl.flink.ingestion$.main(ingestion.scala:27)
at com.sc.edl.flink.ingestion.main(ingestion.scala)
Caused by: java.lang.NoSuchMethodException: org.apache.flink.client.LocalExecutor.<init>(org.apache.flink.configuration.Configuration)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.flink.api.common.PlanExecutor.createLocalExecutor(PlanExecutor.java:158)
然后在my $exts = qr/ abc | mi | avi | ma | iff | tga /x;
if ( $f =~ /^([^.]+\.$exts)/ ) {
...
}
内,正则表达式模式的内容在双引号上下文中计算。这意味着Perl会将 $f =~ /^([^.]+\.$exts)/
字符串化为$exts
(确切的结果取决于Perl编译指示的位置)和插入之前的字符串编译模式
所以正则表达式匹配实际上就是这样做
(?^x: abc | mi | avi | ma | iff | tga )
这显然是正确的,因为在表达式
中启用了$f =~ /^([^.]+\.(?^x: abc | mi | avi | ma | iff | tga ))/
修饰符
与Python的不同之处仅在于 ,而且/x
对象的re
或{{1}返回的内容并不那么谨慎方法,因此它们不能作为子串注入其他模式
据我所知,pattern
方法只返回编译为创建对象的原始正则表达式字符串。这使得它更像是使用C __str__
符号:您必须非常小心括号,无论是在原始的定义中还是在其调用中
答案 2 :(得分:0)
是的,它可以!在关于re的Python文档中,我发现你可以在表达式中指定任何re标志 - 类似于Perl如何打印re标志内联。通过将字符串添加到字符串hack,您可以获得结果:
exts = '''(?ix)(?:
abc # alembic
|mi # mentalray
|avi # windows video
|ma # maya ascii
|iff # amiga bitmap
|tga # targa bitmap
)'''
for f in sorted(files):
m = re.search(r'^([^.]+\.{EXTS})'.format(EXTS=exts),f)
if m:
print 'file matches: {0}'.format(f)
else:
print 'file does not match: {0}'.format(f)
(?ix)
是非分组的,但设置了re.IGNORECASE和re.VERBOSE。