
时间:2009-05-12 07:38:04

标签: regex perl

从这个字符串(champs1 (champs6 donnee_o donnee_f) [(champs2 [] (champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) (debut 144825 25345) (fin 244102 40647)), (champs2 [] (champs3 _FuGNJeyyyyyyB (champs4 donnee_z)) (debut 796443 190570) (fin 145247 42663))] [] []).,我想提取单词“debut”之后的第一个数字,以及单词“fin”之后的第一个数字。我写这个:

while (my $readfile = <FILE>) #read each line and check the first value X1 after the word "coorDeb" and the first value X2 after the word "coorFin"
    my ($line) = $_;
    chomp ($line);

    ($first, $second)= ~m/coorDeb/\s\S*\s\S*\s\S*\s\S*\s\S*; #CoorDeb first, following by X1

    $X1=$first; $X4=$second;
    $lenght1=$second-$first; # Calculation of the lenght of first segment

    $line  =~ m//coorFin/(\s*)\S*\s*\S*\s*\S*\s*\S*\s*(\S*/); #CoorFin first, following by X1
    $lenght2=$second-$first; # Calculation of the lenght of first segment

    push(@elements1, $lenght1); #Push the lenght into a table to compute the mean of lenght for the segment n°1
    push(@elements2, $lenght2); #Push the lenght into a table to compute the mean of lenght for the segment n°2


2 个答案:

答案 0 :(得分:4)



use strict;
use warnings;

my @elements;
while (<DATA>) {
  my $line = $_;
  push @elements, $line =~ /debut (\d+).*?fin (\d+)/g;

print join ',', @elements;
print "\n";
(champs1 (champs6 donnee_o donnee_f) [(champs2 [] (champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) (debut 144825 25345) (fin 244102 40647)), (champs2 [] (champs3 _FuGNJeyyyyyyB (champs4 donnee_z)) (debut 796443 190570) (fin 145247 42663))] [] [])



$line甚至不需要,因为m//默认情况下会在$_上运行,但是如果您确实需要对其进行其他处理,我会将其留在那里。并且push @elements, /debut (\d+).*?fin (\d+)/g;比我认为合适的更加混淆。)


push @debuts, $line =~ /debut (\d+)/g;
push @fins, $line =~ /fin (\d+)/g;

答案 1 :(得分:0)



请注意,我已经更改了您的文件读取,因为您正在读取变量然后处理$ _这是(几乎肯定)不是您的意思。

while (my $line = <FILE>) #read each line from FILE.
    chomp ($line);

    # These two lines could be combined but this is a little clearer.
    # Matching against [0-9] because \d matches all unicode digits.
    my ($fin_digits) = $line =~ /fin\s+([0-9]+)/;   
    my ($debut_digits) = $line =~ /debut\s+([0-9]+)/; # as above.

    # Continue processing below...




while (my $line = <FILE>) #read each line from FILE.
    chomp ($line);

    # These two lines could be combined but this is a little clearer.
    # Matching against [0-9] because \d matches all unicode digits.
    # In list context, m// returns the matches in order, the /g modifier
    # makes this a global match - in a loop this means each pair of
    # matches will be returned in order.
    while (my ($debut, $fin) =~ /debut\s+([0-9]+).+?fin\s+([0-9]+)/g)
           # result processing here.
