Question

我仍然是正则表达式的新手，我遇到了一些问题。我正在构建一个解析脚本，我需要能够从文件中提取一定长度的行。

如何编写正则表达式来匹配具有一定数量单词的行？例如，我想匹配一个包含3个单词的文件中的所有行。

我可以扩展它来查找某些参数中的所有行吗？例如，我想匹配文件中包含2到5个单词的所有行。

我正在使用perl以防万一。谢谢！

Answer 1

这取决于你认为是一个单词。 Perl 5认为单词为/\w+/。如果您有不同的定义，则需要提供它。

您可以使用Count Of secret operator找到正则表达式匹配的次数：()=：

my $count = ()= $line =~ /\w+/g;

一旦您知道了单词数量，就可以使用if statement轻松构建>= and <= operators来打印一行，如果数字或单词位于两个数字之间。

在Perl 5.10及更高版本中，可以使用possessive quantifier匹配2到5个单词：

#!/usr/bin/perl

use strict;
use warnings;

while (my $line = <DATA>) {
    next unless $line =~ /^(?:\W*+\w++){2,5}$/;
    print $line;
}

__DATA__
one
one two
one two three
one two three four
one two three four five
one two three four five six

Answer 2

（Chas的答案不太正确 - 他错过了m//运营商的旗帜。）：）

use strict;
use warnings;

use Data::Dumper;

my @good;
foreach my $line (<DATA>)
{
    chomp $line;
    my $matches =()= ($line =~ /\b\w+\b/g);
    print "(debugging) found matches $matches\n";
    push @good, $line if $matches >= 2 and $matches <= 5;
}

print "matching lines: ", Dumper(\@good);

__DATA__
foo bar baz bap
foo bar baz
blah blah blah foooo

bip

产生

(debugging) found matches 4
(debugging) found matches 3
(debugging) found matches 4
(debugging) found matches 0
(debugging) found matches 1
matching lines: $VAR1 = [
          '    foo bar baz bap',
          '    foo bar baz',
          '    blah blah blah foooo'
        ];

Answer 3

将3替换为您要查找的单词数。此正则表达式假设没有空格或制表符开始该行：

^（？=（\ B [A-ZA-Z0-9] + \ B [\ X20]）{3}）（。）*

这说匹配：从每行的开头通过每行查看3个字母数字或句点单词每个单独空格如果我们展望未来的匹配，那么选择整条线，无论它是什么

注意：\ x20与空格字符匹配，正则表达式是在记事本++中由内存和手工开发的。

Answer 4

这是一种亲吻方式。

while(<>){
  #assumption: words separated by spaces
  @s = split /\s+/ ;
  # now check the length of @s and do if/else
}

按正则表达式的单词数匹配行

4 个答案: