Perl正则表达式分组

时间:2015-05-26 23:34:52

标签: regex perl

我需要帮助将以下字符串拆分为键/值对。

示例:

string='1256789: David - This is assigned to David 345678: Mike - This order 000345 assigned to Mike 456901: Roger - This is assigned to Roger'

我想从匹配模式“[0-9]:名称 - ”

中拆分上述字符串

所以我需要键/值对,如下所示:

1256789=>David - This is assigned to David
345678=>Mike - This order 000345 assigned to Mike
456901=>Roger - This is assigned to Roger

4 个答案:

答案 0 :(得分:0)

从您的问题描述中,不清楚您的输入字符串是否在记录之间分隔;换句话说,这些只是与空格捆绑在一起吗?如果是这样,问题变得有点棘手。在这种情况下,我只是将块中的字符串去掉:

use strict;
# Concatenating these to suppress side-to-side scrolling:
my $string='1256789: David - This is assigned to David'
    . ' 345678: Mike - This order 000345 assigned to Mike'
    . ' 456901: Roger - This is assigned to Roger';
my %orders;
while( $string ) {
    my ($order, $desc, $rest) = $string =~ /^(\d+):\s+(.*?)\s*(\d+:.*)?$/;
    $orders{$order} = $desc;
    $string = $rest;
}

此时,%orders将拥有您想要的内容。这种方式有点尴尬。使用不同的正则表达式,使用g正则表达式修饰符,您可以在一个表达式中卸载所有这些,但我会将其作为练习。

答案 1 :(得分:0)

这是另一种完成工作的方式。它使用积极的前瞻(不是我在评论中提到的否定)。

档案:pattern.pl

#!/usr/bin/env perl
use strict;
use warnings;

my $string='1256789: David - This is assigned to David 345678: Mike - This order 000345 assigned to Mike 456901: Roger - This is assigned to Roger';

while ($string =~ m/(\d+): (\w+ - .*?)(?=\s*\d+: \w+ -|$)/g)
{
    print "$1 == $2\n";
}

示例运行:

$ perl pattern.pl | so
1256789 == David - This is assigned to David
345678 == Mike - This order 000345 assigned to Mike
456901 == Roger - This is assigned to Roger
$

正则表达式查找一个或多个数字的序列,一个冒号,一个单词字符序列和一个破折号,后跟一个非贪婪的任何字符串,直到尾随上下文。尾随上下文是一些空格,一些数字,一个冒号,一个单词,一个破折号或字符串的结尾。匹配通过g修饰符或后缀重复应用。

您可以使用\s+代替空格来优化和改进正则表达式,并使用x作为修饰符,以便将其拆分以便于理解:

while ($string =~ m/(\d+): \s+ (\w+ \s+ - \s+ .*?)(?=\s*\d+: \s+ \w+ \s+ - \s+ |$)/gx)

您可以修改打印以显示不包含尾随空格:

print "[$1] == [$2]\n";

产生:

[1256789] == [David - This is assigned to David]
[345678] == [Mike - This order 000345 assigned to Mike]
[456901] == [Roger - This is assigned to Roger]

等等。

答案 2 :(得分:0)

只需要几个简单的split操作

use strict;
use warnings;

my $string = '1256789: David - This is assigned to David 345678: Mike - This order 000345 assigned to Mike 456901: Roger - This is assigned to Roger';

my @assignments = split /\s+(?=\d+:)/, $string;
my %assignments = map { split /\s*:\s*/, $_, 2 } @assignments;

use Data::Dump;

dd \%assignments;

<强>输出

{
  345678  => "Mike - This order 000345 assigned to Mike",
  456901  => "Roger - This is assigned to Roger",
  1256789 => "David - This is assigned to David",
}

答案 3 :(得分:-1)

相当容易,正则表达式。

$ hash {$ 1} = $ 2

 ( \d+ )                       # (1)
 \s* : \s*
 ( \w+ )                       # (2)

编辑:我其实以为你在寻找关于如何制作正则表达式的一般概念。
我不愿意给你代码,因为在Perl中有不止一种方法可以做到这一点 我甚至不知道Perl会在你的问题中看到任何线索。

现在您要求完整并准备发送代码。这是...
从正则表达式转到单行哈希..你完成了!

 use strict;
 use warnings;
 $/ = "";
 my $input = <DATA>;
 my %hash = $input =~ /(\d+)\s*:\s*((?s:(?!\d+\s*:).)*)/g;
 for (keys %hash) {
    print "$_ => $hash{$_}\n\n";
 }
 __DATA__
 1256789: David - This is assigned to David 345678: Mike - This order 000345 assigned to Mike 456901: Roger - This is assigned to Roger

输出:

 456901 => Roger - This is assigned to Roger

 1256789 => David - This is assigned to David

 345678 => Mike - This order 000345 assigned to Mike