具有可变数量的匹配组的正则表达式

时间:2015-05-22 08:17:06

标签: python regex pattern-matching

我希望能够编写模式来识别列表中的文件名

import re

NOTES = ["c", "c#", "d", "d#", "e", "f", "f#", "g", "g#", "a", "a#", "b"]

filelist1 = ["piano c3.wav", "piano c#3.wav", "piano d4.wav"]
pattern1 = "piano %notename.wav"

filelist2 = ["72__54.wav", "60__127.wav", "48__61.wav"]
pattern2 = "%midinote__%velocity.wav"

关键字:

  • %midinote%velocity应为整数
  • %notename应该是列表NOTES
  • 中的字符串

以下代码用于解析文件名,但仅当模式中存在3个关键字时,按%midinote,%velocity,%notename:

的顺序
pattern1 = pattern1.replace("%midinote", r"(\d+)").replace("%velocity", r"(\d+)").replace("%notename", r"([A-Ga-g]#?[0-9])")
for fname in filelist1:
    m = re.match(pattern1, fname)
    if m:
        midinote = int(m.groups()[0])
        velocity = int(m.groups()[1])
        notename = m.groups()[2]
        notenametomidi = NOTES.index(notename[:-1].lower()) + (int(notename[-1])+2) * 12
        print fname, midinote, velocity, notename, notenametomidi

但如果是一种模式:

  • 只有1个或2个关键字

  • 或者有3个关键字,但在另一个顺序中是之前定义的顺序,

然后代码失败。

如何使用匹配组的变量编号进行正则表达式?

2 个答案:

答案 0 :(得分:1)

我认为你所寻找的是被称为捕获组。试试这个:

pattern1 = pattern1.replace("%midinote", r"(?P<midinote>\d+)").replace("%velocity", r"(?P<velocity>\d+)").replace("%notename", r"(?P<notename>[A-Ga-g]#?[0-9])")
for fname in filelist1:
    m = re.match(pattern1, fname)
    if m:
        info = m.groupdict()
        midinote = int(info.get('midinote',0))
        velocity = int(info.get('velocity',0))
        notename = info.get('notename', 'c')
        notenametomidi = NOTES.index(notename[:-1].lower()) + (int(notename[-1])+2) * 12
        print fname, midinote, velocity, notename, notenametomidi

当然,您必须根据需要更改标准值。

答案 1 :(得分:1)

您希望使用命名捕获组。以下是一些功能,以及一些演示代码:

# extract_midi_info.py

# For Python 2/3 compatibility
from __future__ import print_function

import re

NOTES = ("c", "c#", "d", "d#", "e", "f", "f#", "g", "g#", "a", "a#", "b")


def notename_to_midi(notename):
    return NOTES.index(notename[:-1].lower()) + (int(notename[-1])+2) * 12


def extract_midi_info(pattern, s):
    pattern = pattern.replace("%midinote", r"(?P<midinote>\d+)")
    pattern = pattern.replace("%velocity", r"(?P<velocity>\d+)")
    pattern = pattern.replace("%notename", r"(?P<notename>[A-Ga-g]#?[0-9])")

    m = re.match(pattern, s)

    if m:
        info = m.groupdict()
        if 'midinote' in info:
            info['midinote'] = int(info['midinote'])
        if 'velocity' in info:
            info['velocity'] = int(info['velocity'])
        if 'notename' in info:
            info['notename_midi'] = notename_to_midi(info['notename'])
    else:
        info = {}

    return info


def main():
    filelist_a = ["bonjour c3.wav", "bonjour c#3.wav", "bonjour d4.wav"]
    pattern_a = "bonjour %notename.wav"

    filelist_b = ["72__54.wav", "60__127.wav", "48__61.wav"]
    pattern_b = "%midinote__%velocity.wav"

    samples = [('A', filelist_a, pattern_a), ('B', filelist_b, pattern_b)]

    for name, filelist, pattern in samples:
        print()
        print('Filelist {0}'.format(name))
        for filename in filelist:
            info = extract_midi_info(pattern, filename)
            print(info)

    print()


if __name__ == '__main__':
    main()