Perl从String中删除不需要的行

时间:2011-03-15 19:41:01

标签: regex perl

我正在写一个perl脚本。我想过滤掉与给定正则表达式不匹配的行。问题是我不能这样做。

我有以下几行:

"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/SeverityLevelCounter"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/FilterSet"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/util/Locale"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/CheckstyleException"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/PackageObjectFactory"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/DefaultContext"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/AutomaticBean"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/FileSetCheck"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/Filter"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/AuditListener"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/lang/StringBuilder"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/lang/Exception"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/io/File"

我希望在com/puppycrawl/tools/checkstyle/

之后删除所有不以->开头的内容

到目前为止,我的脚本看起来像:

#! /usr/bin/perl -s

use File::Find;

our ($roll);

$dir = shift or die("Folder missing\n");
$prefix = shift;


$command = "javap -v";
$extension = "class";
$temp_file = "temp.tmp";

find(\&wanted, $dir);
sub wanted 
{
    if ($_ =~ /\.$extension$/)
    {
        push (@class_files, $File::Find::name);
    }
}

print "digraph G\n{\n";
    print "node [shape=box]\n";

    foreach $class (@class_files) {
        $class=~ s/(.*)\..*/$1/;
        $_result= `$command $class | grep " = class"`;
        $_result=~ s/.*\/\/ */\"$class\" -> /g;

        $_line.=$_result;
    }

    $_line=~ s/"$dir\//"/g;
    $_line=~ s/\[[A-Z]?//g;
    $_line=~ s/\;//g;
    $_line=~ s/->\s*(.*)/-> \"$1\"/g;   

6 个答案:

答案 0 :(得分:4)

这样的事可能吗?

   perl -ne 'print unless /->\s+"com\/puppycrawl\/tools\/checkstyle"/' filename.txt

答案 1 :(得分:2)

在脚本结束时,您可以添加以下行:

$_line = join("\n", grep { $_ !~ m{->\s+"com/puppycrawl/tools/checkstyle} }
                       split(/\n/, $_line) );

这(从后面到前面)

a。)将$_line拆分为单独的行

b。)使用grep

过滤掉不需要的行

c。)再次将行连接到$ _line

答案 2 :(得分:1)

在您创建@class_spec列表后可以执行此操作(您可以使用my @class_spec = split(/\n/, @class_files);执行此操作:

# only keep the class specifiations that match the desired pattern
@class_spec = grep {
    m# -> com/puppycrawl/tools/checkstyle/#;
} @class_spec;

答案 3 :(得分:1)

您知道您不必为正则表达式运算符使用正斜杠。正确?

foreach my $line (@list) {
   print "$line" if ($line =~ m(->\s+com/puppycrawl/tools/checkstyle));
}

您可以在m之后使用任何字符:

foreach my $line (@list) {
   print "$line" if ($line =~ m#->\s+"com/puppycrawl/tools/checkstyle#);
}

foreach my $line (@list) {
   print "$line" unless ($line =~ m@->\s+com/puppycrawl/tools/checkstyle@);
}

这使得使用包含斜杠的正则表达式变得更容易。

顺便说一句,你可以一次将整个文件读入一个数组:

open (MY_FILE, "file.txt") or die qq(A slow and painful death\n);
my @list = <MY_FILE>;
close (MY_FILE);    #No longer needed. It's in @list.

另外,我讨厌File::Find因为它打破了模块编写中的每一条规则。我编写了自己的程序,不需要您将整个程序放在想要的子程序中或使用全局变量:http://db.tt/SSAw1x3

答案 4 :(得分:0)

也许这样的事可以帮助

use strict;
use warnings;

# the desired string
my $to_match = 'com/puppycrawl/tools/checkstyle/';

my $pattern = qr/   # compile the regex
    ->      # start matching from the arrow ->
    \s+     # which is followed by a space
    "       # and then a "
    $to_match   # finally the desired string
    /x; 

while (my $line = <DATA>) {
    chomp $line;
    next if $line =~ /^\s*$/; 
    next unless $line =~ /$pattern/;
    print $line, "\n";
}


__DATA__
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/SeverityLevelCounter"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/FilterSet"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/util/Locale"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/CheckstyleException"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/PackageObjectFactory"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/DefaultContext"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/AutomaticBean"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/FileSetCheck"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/Filter"
"com/puppycrawl/tools/checkstyle/Checker" -> "com/puppycrawl/tools/checkstyle/api/AuditListener"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/lang/StringBuilder"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/lang/Exception"
"com/puppycrawl/tools/checkstyle/Checker" -> "java/io/File"

答案 5 :(得分:-1)

在标量上下文中使用反引号为您提供单个字符串中的程序输出,以及关于哪些字符可以匹配换行符以及贪婪的正则表达式是否会吞噬多行的令人头疼的问题。如果您在列表上下文中使用反引号,也许您的代码会更清晰:

foreach $class (@class_files) {
    $class=~ s/(.*)\..*/$1/;
    my @_result = `$command $class | grep " = class"`;
    s{.*// *}{"$class" -> }g foreach @_result;
    $line .= join'', @_result;
}