我有一行包含一个,两个或三个单词,以LowerCase开头,后跟一个冒号,后面跟着一些以UpperCase开头的(任意)单词:
示例(任意):
alpha: Beta beta gamma: Alpha Beta gamma beta gamma: Omega Omega omega alpha: Gamma Omega Phi
拆分标准:以小写字母开头的任意数量的单词后跟冒号。
示例:
alpha: Beta
beta gamma: Alpha Beta
gamma beta gamma: Omega Omega
omega alpha: Gamma Omega Phi
一点帮助表示赞赏。感谢
答案 0 :(得分:2)
use strict;
use warnings;
my $lcword = qr!\b[a-z]+!; # all-lowercase word
my $ucfword = qr!\b[A-Z][a-z]+!; # word with a leading uppercase letter
my @list = $string =~ m!((?:$lcword|\s)+: (?:$ucfword|\s)+)!g;
print join("\n", @list), "\n";
答案 1 :(得分:0)
使用替换而非拆分。
$string =~ s/(stuff that must precede a newline)(stuff that must follow a newline)/\1\n\2/g;
最后的g使它成为全球性的。第一个paren应该匹配以大写字母开头的任意数量的单词,第二个paren应匹配任意数量的小写单词,后跟冒号(或分号)。
答案 2 :(得分:0)
在你的例子中,分号(你的意思是冒号吗?)似乎不是承重的。试试这个:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) { # for each line, look for
s! \b([A-Z][a-z]+) # - a capitalized word
\s+ # - followed by whitespace
([a-z]+) # - followed by a lowercased word
!$1\n$2!xg; # and turn that whitespace into a newline
print;
}
__END__
alpha: Beta beta gamma: Alpha Beta gamma beta gamma: Omega Omega omega alpha: Gamma Omega Phi
打印:
alpha: Beta
beta gamma: Alpha Beta
gamma beta gamma: Omega Omega
omega alpha: Gamma Omega Phi
答案 3 :(得分:0)
你可能更喜欢分裂,但是如果你想用正则表达式做的话:
#!/usr/bin/perl -w
use strict;
my $string = "alpha: Beta beta gamma: Alpha Beta gamma beta gamma: Omega Omega omega alpha: Gamma Omega Phi";
my @list = $string =~ /(\b[a-z]\w+(?: [a-z]\w+){0,2}: [A-Z]\w+(?: [A-Z]\w+)*)/g;
print "$_\n" for @list;
答案 4 :(得分:0)
这是你想要的吗?
use strict; use warnings;
my $string="alpha: Beta beta gamma: Alpha Beta gamma beta gamma: "
."Omega Omega omega alpha: Gamma Omega Phi";
my @list = split /\s([^A-Z]+|\s+)\:\s+/, $string;
my @first = split /\:/, $list[0];
shift @list;
@list = (@first,@list);
print $string.$/;
print $_,$/ for @list;