Question

我喜欢用一个单词'module'来表示一个verilog文件中的文本到'endmodule'这个词。 verilog文件可能包含多个模块，所以我想指出一个特定的模块。

此外，我想忽略任何评论栏中的任何“endmodule”字。

Verilog文件样本：

module whatever
//endmodule
// endmodule
// asadsadadsa endmodule
// enasaa endmodule asas
/* endmodule */
endmodule // whatever
module nonsense
//
// bla bla
//
endmodule // nonsense

假设我想从上面捕获模块无论。我正在使用Perl单线模式。

到目前为止，我已达到这一点：

if ($content =~ m/(module\s+whatever[\s(#]?.*?endmodule(?:\s*\/\/\s*whatever)?)/s)
{
    print $1;
}
else
{
    print "NOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!!!!!!!!!!\n";
}

到目前为止，这与第一次出现的' // endmodule '相匹配

任何帮助或提示都将不胜感激。

Answer 1

这个有点棘手。这个想法通常是为了区分你想要匹配的所有可能的东西，把它们交替进行重复。

那么我们想要匹配什么？

单行注释：//到字符串的末尾，无论如何。
阻止评论：/*，直到下一个*/，无论如何。
其他任何内容，只要它没有启动endmodule

最后一部分可以通过在重复的每个位置使用负前瞻来完成。

所以我们把它放在一起：

$content =~ m~
  module\s+whatever      # marks the start of the module
  (?:                    # each instance of this alternation matches one kind of
                         # module "token"
    //.*+                # match a single-line comment
  |                      # or
    /[*]                 # open a block comment
    (?:(?![*]/)[\s\S])*+ # match anything as long as it doesn't close the comment
    [*]/                 # close the block comment
  |                      # or
    (?!endmodule)[\s\S]  # match anything as long as it doesn't close the module
  )*+                    # repeat
  endmodule
  ~x

诀窍在于前两个选项会跳过您的评论，因此您只关注它们之外的endmodule。

*+为possessive quantifiers。它们（在大多数情况下）是优化，但是//之后的那个和交替周围的那个是严格必要的（否则回溯可能会给你误报）。

Working demo.

但是，由于您正在处理标准化文件格式，因此最好还是寻找这种文件的解析器。

Answer 2

不是将整个文件放在一个正则表达式中，而是一次读一行。

#!/usr/bin/perl
use strict;
my ($file,$module) = qw(verilog.v whatever);

open(VERILOG_FILE, $file) or die "cannot read $!";
my $start=0;
my $store = "";
foreach my $line (<VERILOG_FILE>) {
    die "nested module inside module:'${module}'" if $start && $line =~ m/^\s*module\W/;
    $start|= $line =~ m/^\s*module ${module}\W/;
    $store.=$line if $start;
    if ($start and $line =~ m/^\s*endmodule/) {
        print $store;
        exit 0;
    }
}
die "cannot file module '$module' in file '$file'" if ($start==0);
die "missing endmodule for '$module'"

使用问题中给定的Verilog文件样本并假设名称为verilog.v，然后输出为：

module whatever
//endmodule
// endmodule
// asadsadadsa endmodule
// enasaa endmodule asas
/* endmodule */
endmodule // whatever

Perl正则表达式捕获两个锚点词之间的文本但忽略注释中的锚点词

2 个答案: