从这个字符串(champs1 (champs6 donnee_o donnee_f) [(champs2 [] (champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) (debut 144825 25345) (fin 244102 40647)), (champs2 [] (champs3 _FuGNJeyyyyyyB (champs4 donnee_z)) (debut 796443 190570) (fin 145247 42663))] [] []).
,我想提取单词“debut”之后的第一个数字,以及单词“fin”之后的第一个数字。我写这个:
while (my $readfile = <FILE>) #read each line and check the first value X1 after the word "coorDeb" and the first value X2 after the word "coorFin"
{
my ($line) = $_;
chomp ($line);
($first, $second)= ~m/coorDeb/\s\S*\s\S*\s\S*\s\S*\s\S*; #CoorDeb first, following by X1
$X1=$first; $X4=$second;
$lenght1=$second-$first; # Calculation of the lenght of first segment
$line =~ m//coorFin/(\s*)\S*\s*\S*\s*\S*\s*\S*\s*(\S*/); #CoorFin first, following by X1
$lenght2=$second-$first; # Calculation of the lenght of first segment
push(@elements1, $lenght1); #Push the lenght into a table to compute the mean of lenght for the segment n°1
push(@elements2, $lenght2); #Push the lenght into a table to compute the mean of lenght for the segment n°2
}
有人可以帮我正则表达式吗?谢谢。
答案 0 :(得分:4)
通过尝试计算字段并计算行中的偏移量等等,你使这个方式过于复杂。假设您正在寻找匹配的初次/鳍对,您可以使用
#!/usr/bin/perl
use strict;
use warnings;
my @elements;
while (<DATA>) {
my $line = $_;
push @elements, $line =~ /debut (\d+).*?fin (\d+)/g;
}
print join ',', @elements;
print "\n";
__DATA__
(champs1 (champs6 donnee_o donnee_f) [(champs2 [] (champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) (debut 144825 25345) (fin 244102 40647)), (champs2 [] (champs3 _FuGNJeyyyyyyB (champs4 donnee_z)) (debut 796443 190570) (fin 145247 42663))] [] [])
此代码生成输出
144825,244102,796443,145247
($line
甚至不需要,因为m//
默认情况下会在$_
上运行,但是如果您确实需要对其进行其他处理,我会将其留在那里。并且push @elements, /debut (\d+).*?fin (\d+)/g;
比我认为合适的更加混淆。)
如果您不关心匹配对,您还可以使用两个单独的数组并将push
行替换为
push @debuts, $line =~ /debut (\d+)/g;
push @fins, $line =~ /fin (\d+)/g;
答案 1 :(得分:0)
如果我理解正确,您只需要读取一个文件,然后找到两个值。这些值是单词'fin'之后和'debut'之后的一系列数字。现在,你试图通过寻找在你感兴趣的字符串之前发生的事情来匹配这些。也许你应该寻找感兴趣的实际信息。
在正则表达式中,寻找有趣的文本而不是尝试跳过非有趣的文本几乎总是更好。像下面这样的东西会更好。
请注意,我已经更改了您的文件读取,因为您正在读取变量然后处理$ _这是(几乎肯定)不是您的意思。
while (my $line = <FILE>) #read each line from FILE.
{
chomp ($line);
# These two lines could be combined but this is a little clearer.
# Matching against [0-9] because \d matches all unicode digits.
my ($fin_digits) = $line =~ /fin\s+([0-9]+)/;
my ($debut_digits) = $line =~ /debut\s+([0-9]+)/; # as above.
# Continue processing below...
}
现在,一个区别是你的示例数据显示了一行中多次出现fin和debut。如果是这种情况,您将需要一个略有不同的正则表达式。让我们都知道是否真的如此。
更新
鉴于你确实在同一行上有匹配的对,你可能想要使用类似下面的东西。同样,我只是放入正则表达式匹配而不是处理代码。这段代码实际上允许在一行上有任意数量的对。
while (my $line = <FILE>) #read each line from FILE.
{
chomp ($line);
# These two lines could be combined but this is a little clearer.
# Matching against [0-9] because \d matches all unicode digits.
# In list context, m// returns the matches in order, the /g modifier
# makes this a global match - in a loop this means each pair of
# matches will be returned in order.
while (my ($debut, $fin) =~ /debut\s+([0-9]+).+?fin\s+([0-9]+)/g)
{
# result processing here.
}
}