从先前解析的值中解析特定数量的行

时间:2011-06-02 22:31:24

标签: parsing pyparsing

我正在使用pyparsing来摄取gEDA原理图/符号文件格式。大多数是直截了当的,但我不确定如何匹配初始行上的整数字段指定的多个跟随行。

文本对象的格式如下:

(other objects)
T x y color size vis snv angle align num_lines
Text line one
Line two of the text
Finally, the 'num_lines'th line
(other objects)

num_lines 一个整数。此样式也用于其他几种类型。

作为一种解决方法,我将这些行定义为匹配有效对象类型的任何。从技术上讲,文本对象中允许使用类似对象的行

text_meta = Type("T") + coord + color + size + visibility + show_name_value \   
            + angle + alignment + num_lines + EOL                                   
text_data_line = ~obj + LineStart() + SkipTo(LineEnd()) + EOL                   
text_data = Group(OneOrMore(text_data_line)).setResultsName('text')             
text_data = text_data.setParseAction(lambda t: '\n'.join(t[0]))                 
text = text_meta + text_data

即时生成匹配规则,如:

def genLineMatcher(n):
    return (LineStart() + Skipto(LineEnd()) + EOL)*n

在桌面上,但我不确定如何指定规则。

2 个答案:

答案 0 :(得分:0)

  

即时生成匹配规则......

你实际上是在正确的轨道上。动态创建规则的方式是将变量长度表达式定义为Forward(),然后在解析计数字段时插入实际规则的解析操作。

幸运的是,在辅助方法countedArray中已经实现了pyparsing。如果您将表达式更改为:

text_meta = (Type("T") + coord + color + size + visibility + show_name_value +
               angle + alignment + countedArray(EOL + restOfLine)("lines"))

我认为这会做你想要的。然后,您可以使用“lines”结果名称检索行数组。

答案 1 :(得分:0)

pyparsing辅助函数'countingArray(expr)'几乎是所需要的。解析器定义和修改的辅助函数:

def numLinesList(expr, name=None):                                                                                                                                                                        
    """Helper to snarf an end-of-line integer and match 'expr' N times after.                                                                                                                        
    Almost exactly like pyparsing.countedArray.                                                                                                                                                      
    Matches patterns of the form::                                                                                                                                                                   
        ... num_lines                                                                                                                                                                                
        line one                                                                                                                                                                                     
        line two                                                                                                                                                                                     
        num_lines'th line                                                                                                                                                                            
    """                                                                                                                                                                                              
    arrayExpr = Forward()                                                                                                                                                                            
    def numLinesAction(s, l, t):                                                                                                                                                                     
        n = int(t[0])                                                                                                                                                                                
        arrayExpr << (n and Group(And([expr]*(n+1))) or Group(empty))                                                                                                                                
        return []                                                                                                                                                                                    
    matcher = Word(nums).setParseAction(numLinesAction, callDuringTry=True) \                                                                                                                        
              + arrayExpr                                                                                                                                                                            
    # remove first empty string                                                                                                                                                                      
    matcher.addParseAction(lambda t: [t[0][1:]])                                                                                                                                                     
    if name:
        matcher = matcher.setResultsName(name)                                                                                                                                                           
    return matcher

text_meta = Type("T") + coord + color + size + visibility + show_name_value \   
        + angle + alignment
text_data_line = SkipTo(LineEnd()) + EOL
text_data = numLinesList(text_data_line, 'text')
text = text_meta + text_data

输入摘录:

...
T 41600 47800 9 10 1 0 0 0 2                                                                                                                                                                         
This is line 1                                                                                                                                                                                       
line 2 is here...                                                                                                                                                                                    
T 41600 47000 9 10 1 0 0 0 2                                                                                                                                                                         
Another first line                                                                                                                                                                                   
second line foo

输出:

['T', 41600, 47800, '9', 10, True, '0', 0, 0, ['This is line 1', 'line 2 is here...']]
['T', 41600, 47000, '9', 10, True, '0', 0, 0, ['Another first line', 'second line foo']]