正则表达式返回2组

时间:2014-03-08 10:36:46

标签: python regex

我有以下数据需要从中提取2组。我需要varDestinations之间的3个字母代码,而在第二个组中,我需要Array之后的所有3个字母代码(不带单引号),但不需要代​​码以//开头的行。

以下是我到目前为止的正则表达式,感谢任何帮助。

var\s([A-Z]{3})_Destinations\s*=\snew\sArray\((?:,?)|(\'([A-Z]{3})\')*

var Dests = new Array ('KIR','SEN','MAN','NCL','RNS','SNN',0); #Don't need any of this

//var NOC_Destinations  = new Array('BHX'); # Don't need any of this
var ABZ_Destinations    = new Array('DUB'); # Need this
//var RNS_Destinations  = new Array('ORK','DUB'); # Don't need this
var BHX_Destinations    = new Array('ORK','DUB','SNN'); # Need this

1 个答案:

答案 0 :(得分:1)

虽然@thefourtheye是正确的,但只要您的用例仅限于提供的示例,您就可以:

text = """
//var NOC_Destinations  = new Array('BHX'); # Don't need any of this
var ABZ_Destinations    = new Array('DUB'); # Need this
//var RNS_Destinations  = new Array('ORK','DUB'); Don't need this
var BHX_Destinations    = new Array('ORK','DUB','SNN'); # Need this
"""

import re
import ast

from_to = {frm: ast.literal_eval(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ('ORK', 'DUB', 'SNN'), 'ABZ': 'DUB'}

您可能希望考虑将标准化为以某种方式...可能通过确保它们都是字符串,或所有元组/列表等...类似于:

def to_list(text):
    parsed = ast.literal_eval(text)
    if isinstance(parsed, basestring):
        return [parsed]
    return list(parsed)


from_to = {frm: to_list(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ['ORK', 'DUB', 'SNN'], 'ABZ': ['DUB']}