我在python中编写脚本来提取特色艺术家的名字 从mp3文件名中设置文件的相应id3v2标签。文件名有3种不同的格式:
Artist - Track ft. FeatArtist.mp3
Artist ft. FeatArtist - Track.mp3
Artist - Track (ft. FeatArtist).mp3
这是我写的正则表达式:
r'ft\. (.+)[.-)]'
然后我可以使用re.findall
来获取该组的内容。但这就是我得到的:
In [40]: r = r'ft\. (.+)[.\-)]'
In [47]: re.findall(r, 'Artist - Track ft. FeatArtist.mp3')
Out[47]: ['FeatArtist']
In [48]: re.findall(r, 'Artist ft. FeatArtist - Track.mp3')
Out[48]: ['FeatArtist - Track']
In [49]: re.findall(r, 'Artist - Track (ft. FeatArtist).mp3')
Out[49]: ['FeatArtist)']
我的预期输出完全是三种情况:
FeatArtist
问题是正则表达式尽可能匹配 - 我希望它在找到[.\-)]
中的一个字符后立即停止。我怎么能这样做?
答案 0 :(得分:1)
re.findall(r'ft\.\s*(\w*)',filename)
以下每个文件名:
Artist - Track ft. FeatArtist.mp3 Artist ft. FeatArtist - Track.mp3 Artist - Track (ft. FeatArtist).mp3
将返回:
['FeatArtist']
在您提供的示例中,每个FeatArtist
都会终止,其中包含以下内容之一:空格后跟-
,圆括号和文件扩展名.mp3
如果我们有以下任何一项:
Feat.Artist Feat Artist Feat Middlename Artist Feat Artist One & Artist Two
事情可能会崩溃。 解决上述变种的一种方法可能是:
首先删除文件扩展名而不使用字符串匹配。使用文件名执行此操作可为您提供更清晰的起点:
使用os.path.splitext('Artist - Track ft. FeatArtist.mp3')[0])
,您可以使用以下格式获取文件:Artist - Track ft. FeatArtist
re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)', filename)
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track ft. FeatArtist')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist ft. FeatArtist - Track')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track (ft. FeatArtist)')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track (ft. Feat Artist)')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track (ft. Feat Artist & Other Artist)')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist ft. Feat Artist & Other Artist - Track')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist ft. Feat.Artist & Crew - Track')
['FeatArtist']
['FeatArtist']
['FeatArtist']
['Feat Artist']
['Feat Artist & Other Artist']
['Feat Artist & Other Artist']
['Feat.Artist & Crew']
从python man(添加格式):
re.findall (pattern,string,flags = 0) 返回字符串中pattern的所有非重叠匹配,作为字符串列表。从左到右扫描字符串,并按找到的顺序返回匹配项。 如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。结果中包含空匹配,除非它们触及另一场比赛的开头。
因此,您仍然可以使用repition运算符来建立匹配,并使用组来控制返回的匹配部分。
如果使用支持\K
反向引用的正则表达式引擎,则匹配将是\K
之后的所有内容:
使用grep
-P
(Perl Regex)和-o
的示例(仅返回匹配):
echo "Artist - Track ft. FeatArtist" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
FeatArtist
echo "Artist ft. FeatArtist - Track" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
FeatArtist
echo "Artist - Track (ft. FeatArtist)" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
FeatArtist
echo "Artist ft. Feat Artist & Other Artist - Track" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
Feat Artist & Other Artist
答案 1 :(得分:0)
这应该有效:
List<RuleImpl> rules = Arrays.asList(
new RuleImpl("monday", "today is Monday"),
new RuleImpl("hello", "Hello Welcome"),
new RuleImpl("Sunday", "Today is Sunday")
);
String test = "string to test";
for (RuleImpl rule : rules){
if (rule.matches(test)){
return rule.getValue();
}
}
return null;
查找
(?<=ft\. )[^\-)\. ]+
的字符串
(?<=ft. )
字符串必须是单词,不能有空格/短划线/括号/点。