我有以下数据需要从中提取2组。我需要var
和Destinations
之间的3个字母代码,而在第二个组中,我需要Array
之后的所有3个字母代码(不带单引号),但不需要代码以//
开头的行。
以下是我到目前为止的正则表达式,感谢任何帮助。
var\s([A-Z]{3})_Destinations\s*=\snew\sArray\((?:,?)|(\'([A-Z]{3})\')*
var Dests = new Array ('KIR','SEN','MAN','NCL','RNS','SNN',0); #Don't need any of this
//var NOC_Destinations = new Array('BHX'); # Don't need any of this
var ABZ_Destinations = new Array('DUB'); # Need this
//var RNS_Destinations = new Array('ORK','DUB'); # Don't need this
var BHX_Destinations = new Array('ORK','DUB','SNN'); # Need this
答案 0 :(得分:1)
虽然@thefourtheye是正确的,但只要您的用例仅限于提供的示例,您就可以:
text = """
//var NOC_Destinations = new Array('BHX'); # Don't need any of this
var ABZ_Destinations = new Array('DUB'); # Need this
//var RNS_Destinations = new Array('ORK','DUB'); Don't need this
var BHX_Destinations = new Array('ORK','DUB','SNN'); # Need this
"""
import re
import ast
from_to = {frm: ast.literal_eval(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ('ORK', 'DUB', 'SNN'), 'ABZ': 'DUB'}
您可能希望考虑将标准化为以某种方式...可能通过确保它们都是字符串,或所有元组/列表等...类似于:
def to_list(text):
parsed = ast.literal_eval(text)
if isinstance(parsed, basestring):
return [parsed]
return list(parsed)
from_to = {frm: to_list(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ['ORK', 'DUB', 'SNN'], 'ABZ': ['DUB']}