Question

我正在使用一种将模块定义为

的语言

<module_name> <inst_name>(.<port_name> (<net_name>)….);

或

module1 inst1 ( .input a,
.output b;
port b=a;);

我想找到所有这些模块，而忽略函数调用。

我对正则表达式有困难。我正在寻找这个

 text1 text2 ( .text3; text4 );

请注意除了text 1和text2之间的所有空格都是可选的，可能是新行而不是spaces.text 3和text4可以是多行，但都是

的形式

text3 - >
.blah1 (blah2),
.blah3 (blah4)

text4->
blah1 blah2=xyz;
blah3 blah4=qwe;

我正在尝试

 re.split(r"^[a-zA-Z]*\s[a-zA-Z]*\s?\n?\([a-zA-Z]*\s?\n?;[a-zA-Z]*\);", data)

虽然没有工作。它只是抓住了一切。我如何解决它？谢谢！！我确实需要单独抓取所有东西（模块/实例/端口/网络）。我认为一旦正则表达式工作，我就可以拆分它。

Answer 1

我认为您需要编写一个解析器，该解析器能够理解足够的语言，以便在尝试提取信息之前至少对其进行规范化。您可以手动编写一个简单的解析器，或者您可以使用解析框架，例如PLY或其他类似的解析器。

为了让你更清楚地了解我的建议，请考虑一下以下代码，定义了给定的parse_data函数一个文件的内容，将产生一系列被识别的标记那个文件：

import re

tokens = {
    'lparen': '\(',
    'rparen': '\)',
    'comma': ',',
    'semicolon': ';',
    'whitespace': '\s+',
    'equals': '=',
    'identifier': '[.\d\w]+',
}

tokens = dict((k, re.compile(v)) for k,v in tokens.items())

def parse_data(data):
    while data:
        for tn, tv in tokens.items():
            mo = tv.match(data)
            if mo:
                matched = data[mo.start():mo.end()]
                data = data[mo.end():]
                yield tn, matched

使用此功能，您可以编写可以输入样本的内容到规范形式：

with open('inputfile') as fd:
    data = fd.read()

last_token = (None, None)
for tn, tv in parse(data):
    if tn == 'whitespace' and last_token[0] != 'semicolon':
        print ' ',
    elif tn == 'whitespace':
        pass
    elif tn == 'semicolon' and last_token[0] == 'rparen':
        print tv
    else:
        print tv,

    last_token = (tn, tv)

给出这样的输入：

module1 inst1 ( .input a,
.output b;
port b=a;);
module2 inst2 ( .input a, .output b; port b=a;);

module3 inst3 ( .input a, .output b;


port b=a;);

上面的代码会产生：

module1   inst1   (   .input   a ,   .output   b ; port   b = a ; ) ;
module2   inst2   (   .input   a ,   .output   b ; port   b = a ; ) ;
module3   inst3   (   .input   a ,   .output   b ; port   b = a ; ) ;

因为它是标准形式，所以可以修改得更多通过简单的模式匹配提取信息。

请注意，虽然此代码依赖于读取整个源文件首先进入内存，你可以很容易地编写你解析的代码如果您担心内存利用率，则会以片段形式存档。

使用正则表达式使用python查找函数调用

1 个答案: