perl:基于模式匹配的字符串提取

时间:2012-08-17 04:48:51

标签: string perl pattern-matching

文件FILE1有数千行,终止模式为_Pattern1。

第二个文件也有数千行具有相同的终止模式_Pattern1。

我现在必须:

  • 逐行阅读FILE1

  • 查明该行是否有任何以_Pattern1

  • 结尾的字符串
  • 提取字符串并将其存储到变量

  • 打开FILE2并逐行阅读

  • 查明刚从FILE2读取的行是否包含存储在上面变量中的字符串

如何在perl中完成?

EDIT2:

好吧,有点谷歌搜索并参考下面列出的链接,我解决了我的问题。 这是代码段。

#!/usr/bin/perl
use strict;
use warnings;

my $OriginalHeader=$ARGV[0]; ## Source file
my $GeneratedHeader=$ARGV[1];## File to compare against
my $DeltaHeader=$ARGV[2];    ## File to store misses

my $MatchingPattern="_Pos";
my $FoundPattern;

open FILE1, $OriginalHeader or die $!;
open FILE2, $GeneratedHeader or die $!;
open (FILE3, ">$DeltaHeader") or die $!;

my $lineFromOriginalHeader;
my $lineFromGeneratedHeader;
my $TotalMacrosExamined = 0;
my $TotalMacrosMissed = 0;

while($lineFromOriginalHeader=<FILE1>)
{
 if($lineFromOriginalHeader =~ /$MatchingPattern/)
  {
    my $index = index($lineFromOriginalHeader,$MatchingPattern);

    my $BackIndex = $index;
    my $BackIndexStart = $index;

    $BackIndex = $BackIndex - 1;

    ## Use this while loop to extract the substring. 
    while (1)
    {
      my $ExtractedChar = substr($lineFromOriginalHeader,$BackIndex,1);
      if ($ExtractedChar =~ / /)
      {
        $FoundPattern = substr($lineFromOriginalHeader,$BackIndex + 1,$BackIndexStart + 3 - 
                                                                                $BackIndex); 
        print "Identified $FoundPattern \n";
        $TotalMacrosExamined = $TotalMacrosExamined + 1;
        ##Skip the next line
        $lineFromOriginalHeader = <FILE1>;
        last;       
      }
     else
     {
      $BackIndex = $BackIndex - 1;
     }

   } ##while(1)

 ## We now look for $FoundPattern in FILE2
 while ($lineFromGeneratedHeader = <FILE2>)
 {
  if (index($lineFromGeneratedHeader,$FoundPattern)!= -1)
   {
     ##Pattern found. Reset file pointer and break out of while loop
     seek FILE2,0,0;
     last;
   }
   else
   {
     if (eof(FILE2) == 1)
      {         
        print FILE3 "Generated header misses $FoundPattern\n";
        $TotalMacrosMissed = $TotalMacrosMissed + 1;
        seek FILE2,0,0; 
        last;       
      }
   }
} ##while(1)

}
else
{
  ##NOP
}
} ##while (linefromoriginalheader)

close FILE1;
close FILE2;
close FILE3;
print "Total number of bitfields examined = $TotalMacrosExamined\n";
print "Number of macros obsolete = $TotalMacrosMissed\n";

2 个答案:

答案 0 :(得分:0)

我一生都在C编程,我用google搜索下面的perl结构并编写了一个类似C的程序。这对我来说完美无缺。 :-)

编辑:这是为了澄清为什么我必须在下面的算法中跳过一行。在第二个文件中检索并稍后搜索的模式发生在两个连续的行上。因此,可靠地检测其第一次出现就足够了。也是一个挑剔,总是保证包含模式的子串始终是该行的第二个子串。

例如#define Something_Pos(Some Value)

#!/usr/bin/perl
use strict;
use warnings;

my $OriginalHeader=$ARGV[0];
my $GeneratedHeader=$ARGV[1];
my $DeltaHeader=$ARGV[2];

my $MatchingPattern="_Pos";
my $FoundPattern;

open FILE1, $OriginalHeader or die $!;
open FILE2, $GeneratedHeader or die $!;
open (FILE3, ">$DeltaHeader") or die $!;

my $lineFromOriginalHeader;
my $lineFromGeneratedHeader;
my $TotalMacrosExamined = 0;
my $TotalMacrosMissed = 0;

while($lineFromOriginalHeader=<FILE1>)
{
 if($lineFromOriginalHeader =~ /$MatchingPattern/)
 {
  my $index = index($lineFromOriginalHeader,$MatchingPattern);

  my $BackIndex = $index;
  my $BackIndexStart = $index;

  $BackIndex = $BackIndex - 1;

  ## Use this while loop to extract the substring. 
  while (1)
  {
   my $ExtractedChar = substr($lineFromOriginalHeader,$BackIndex,1);
   if ($ExtractedChar =~ / /)
    {
     $FoundPattern = substr($lineFromOriginalHeader,$BackIndex + 1,$BackIndexStart + 3 - 
                                                                                $BackIndex); 
     print "Identified $FoundPattern \n";
     $TotalMacrosExamined = $TotalMacrosExamined + 1;
     ##Skip the next line
     $lineFromOriginalHeader = <FILE1>;
     last;       
    }
   else
    {
     $BackIndex = $BackIndex - 1;
    }

} ##while(1)

 ## We now look for $FoundPattern in FILE2
while ($lineFromGeneratedHeader = <FILE2>)
{
 ##print "Read the following line from FILE2: $lineFromGeneratedHeader\n";

  if (index($lineFromGeneratedHeader,$FoundPattern)!= -1)
   {
     ##Pattern found. Close the file and break out of while loop
     seek FILE2,0,0;
     last;
   }
   else
   {
     if (eof(FILE2) == 1)
      {         
        print FILE3 "Generated header misses $FoundPattern\n";
        $TotalMacrosMissed = $TotalMacrosMissed + 1;
        seek FILE2,0,0; 
        last;       
      }
   }
 } ##while(1)

}
else
{

}
} ##while (linefromoriginalheader)

close FILE1;
close FILE2;
close FILE3;
print "Total number of bitfields examined = $TotalMacrosExamined\n";
print "Number of macros obsolete = $TotalMacrosMissed\n";

答案 1 :(得分:0)

第一次尝试使你的代码更加Perly。实际上可以做更多的事情,包括$some_var通常在Perl中使用vs $SomeVar,但我没有那么做。

#!/usr/bin/perl
use strict;
use warnings;

my ($OriginalHeader, $GeneratedHeader, $DeltaHeader) = @ARGV;
my $MatchingPattern=qr/(\S*_Pos)/; # all non-whitespace terminated by _Pos

open my $file1, '<', $OriginalHeader  or die $!;
open my $file2, '<', $GeneratedHeader or die $!;
open my $file3, '>', $DeltaHeader     or die $!;

my $TotalMacrosExamined = 0;
my $TotalMacrosMissed = 0;

while(my $lineFromOriginalHeader=<$file1>) {
  next unless $lineFromOriginalHeader =~ $MatchingPattern;
  my $FoundPattern = $1; # matched string

  print "Identified $FoundPattern \n";
  $TotalMacrosExamined++;

  ##Skip the next line
  <$file1>;

  ## We now look for $FoundPattern in FILE2
  my $match_found = 0;
  while (my $lineFromGeneratedHeader = <$file2>) {
    if (index($lineFromGeneratedHeader,$FoundPattern)!= -1) {
      ##Pattern found. Close the file and break out of while loop
      $match_found++;
      last;
    } 
  }

  unless ($match_found) {
    print $file3 "Generated header misses $FoundPattern\n";
    $TotalMacrosMissed++;
  }

  seek $file2,0,0;

}

print "Total number of bitfields examined = $TotalMacrosExamined\n";
print "Number of macros obsolete = $TotalMacrosMissed\n";