Python - 改进基于regexp的输出分析

时间:2017-04-12 06:55:56

标签: python regex if-statement

我正在编写一个正在获取输出的函数,并根据其内容填充带有对象的字典。 对象可以是2组,并且取决于函数通过的文本文档的哪个部分,在输出中我识别type1或type 2对象并用相关数据填充它们。类型1对象通常位于State1文档部分。 Type2对象 - 在State2中 我主要依赖于elif语句并处理输入文本文件的每一行(作为列表的函数),用正则表达式查找其内容。 然而,代码变得无法管理 - 我将每一行汇集到所有ifs中。 有没有办法让这段代码更好?

def func(list):

    #defining function related variables
    state = ''
    state1_specific_value1 = ''
    state1_specific_value2 = ''
    state1_specific_value3 = ''
    state2_specific_value1 = ''
    state2_specific_value2 = ''
    state2_specific_value3 = ''

    for i in list:

        if REGEXP_DICTIONARY['state1_regexp'].match(i):
          # processing state1 section
          state = 'State1'
        elif REGEXP_DICTIONARY['state2_regexp'].match(i):
          # processing state2 section
          state = 'State2'
        elif REGEXP_DICTIONARY['interesting_line1_regexp'].match(i):
          # detecting some special conditions for a jar. Is it twistable?
          # not dependent on state
          jar_dict[jar].Twistable = True

        elif REGEXP_DICTIONARY['type'].match(i):
            jar_type = clean(i.replace("  blablabla ", "")) # quick clean up jar related string to get jar's name.
            #
            # making decisions based on State delivered from previous lines and Type detected
            #
            if (state == "State1" and type == "Type1"):
                debug("We detected State1 and Type 1")
            elif (state == "State2" and type == "Type2"):
                debug("We detected State2 and Type 2")
            else:
                debug ("inconsistency detected: type is {}, state is {}". format(type, state))

        # State 1 Type1 related block
        elif REGEXP_DICTIONARY['type1_state1_related regexp'].match(i) and state == "State1"
         #do_something

        elif ...
        elif ... 
        elif ... 
        elif ...

        #
        # State 2 Type2 related block
        elif REGEXP_DICTIONARY['type2_state2_related regexp'].match(i) and state == "State2":
            #do_something
        elif ...
        elif ... 
        elif ... 
        elif ...

2 个答案:

答案 0 :(得分:0)

我认为你应该将你的代码分成小的逻辑部分,每部分都有1个动作。这样的事情:

def _get_object_type(obj):
    """I'm getting type of one object"""
    ...

def _process_type_1(type_1_object):
    """I'm processing type 1 objects"""
    ...

def _process_type_2(type_2_object):
    """I'm processing type 2 objects"""
    ...

def _process_object(obj, obj_type):
    """I'm processing object by types"""
    if obj_type == "type_1":
        __process_type_1(obj)
    if obj_type == "type_2":
        __process_type_2(obj)
    ...

def populate(raw_input):
    """I'm populating populated dict from raw_input"""
    populated = {}

    for elem in raw_input:
        elem_type = _get_object_type(elem)
        processed_elem = _process_object(elem, elem_type)
        ...    

因此,您的代码将更加清晰,您可以轻松理解代码的每一小部分:)。

答案 1 :(得分:0)

python re模块使用以下语法支持命名组(?P<name>...)

这意味着您可以像这样创建正则表达式:

state1_regexp = r"(?P<state1>some text that matches state1)"
state2_regexp = r"(?P<state2>some different text for state2)"

然后你可以将你的正则表达式粘贴在一起作为一个巨大的交替:

all_states = '|'.join([state1_regexp, state2_regexp])

现在你有这样的正则表达式:

(?P<state1>...) | (?P<state2>...)

如果你匹配一个包罗万象的正则表达式,你会得到一个结果,如果任何模式命中:

m = re.search(all_states, text)

您可以使用m.groupdict()方法访问这些方法,该方法返回包含所有命名的子组及其匹配项的字典。如果命名的子组键的值为None,则它不匹配。

states = { k:v for k,v in m.groupdict().items() if v is not None}

这是一个演示版:

import re
state1 = r'(?P<state1>foo)'
state2 = r'(?P<state2>bar)'
all_re = '|'.join([state1, state2])
text = "eat your own foo"
m = re.search(all_re, text)
states = {k:v for k,v in m.groupdict().items() if v is not None}
print(states)

获得states字典后,您可以确认它只有一个密钥(一次只匹配一个状态)。或者不是 - 也许两个州可以同时匹配!

无论如何,您可以使用属性名称或函数查找字典或您喜欢的任何技术来遍历键并调用特定于州的代码:

def handle_state1():
    pass
def handle_state2():
    pass
dispatch = {
    'state1' : handle_state1,
    'state2' : handle_state2,
}

for k in states.keys():
    dispatch[k]()