按特殊标准拆分(perl)

时间:2011-10-28 01:28:39

标签: perl

我有一行包含一个,两个或三个单词,以LowerCase开头,后跟一个冒号,后面跟着一些以UpperCase开头的(任意)单词:

示例(任意):

alpha: Beta beta gamma: Alpha Beta gamma beta gamma: Omega Omega omega alpha: Gamma Omega Phi

拆分标准:以小写字母开头的任意数量的单词后跟冒号。

示例:

alpha: Beta
beta gamma: Alpha Beta
gamma beta gamma: Omega Omega
omega alpha: Gamma Omega Phi

一点帮助表示赞赏。感谢

5 个答案:

答案 0 :(得分:2)

use strict;
use warnings;

my $lcword  = qr!\b[a-z]+!;      # all-lowercase word
my $ucfword = qr!\b[A-Z][a-z]+!; # word with a leading uppercase letter
my @list = $string =~ m!((?:$lcword|\s)+: (?:$ucfword|\s)+)!g;
print join("\n", @list), "\n";

答案 1 :(得分:0)

使用替换而非拆分。

$string =~ s/(stuff that must precede a newline)(stuff that must follow a newline)/\1\n\2/g;

最后的g使它成为全球性的。第一个paren应该匹配以大写字母开头的任意数量的单词,第二个paren应匹配任意数量的小写单词,后跟冒号(或分号)。

答案 2 :(得分:0)

在你的例子中,分号(你的意思是冒号吗?)似乎不是承重的。试试这个:

#!/usr/bin/env perl

use strict;
use warnings;

while (<DATA>) {       # for each line, look for
  s! \b([A-Z][a-z]+)   #  - a capitalized word
       \s+             #  - followed by whitespace
     ([a-z]+)          #  - followed by a lowercased word
   !$1\n$2!xg;         # and turn that whitespace into a newline
  print;
}

__END__
alpha: Beta beta gamma: Alpha Beta gamma beta gamma: Omega Omega omega alpha: Gamma Omega Phi

打印:

alpha: Beta
beta gamma: Alpha Beta
gamma beta gamma: Omega Omega
omega alpha: Gamma Omega Phi

答案 3 :(得分:0)

你可能更喜欢分裂,但是如果你想用正则表达式做的话:

#!/usr/bin/perl -w

use strict;

my $string = "alpha: Beta beta gamma: Alpha Beta gamma beta gamma: Omega Omega omega alpha: Gamma Omega Phi";

my @list = $string =~ /(\b[a-z]\w+(?: [a-z]\w+){0,2}: [A-Z]\w+(?: [A-Z]\w+)*)/g;

print "$_\n" for @list;

答案 4 :(得分:0)

这是你想要的吗?

use strict; use warnings;
my $string="alpha: Beta beta gamma: Alpha Beta gamma beta gamma: "
    ."Omega Omega omega alpha: Gamma Omega Phi";
my @list = split /\s([^A-Z]+|\s+)\:\s+/, $string;
my @first = split /\:/, $list[0];
shift @list;
@list = (@first,@list);
print $string.$/;
print $_,$/ for @list;