我有一个包含逗号分隔值的字段的数据库。我需要在Perl中拆分这些字段,这很简单,除了一些值后面跟着包含在括号中的嵌套CSV,我不想拆分。
示例:
recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education
分裂“,”给了我:
recycling
environmental science
interdisciplinary (e.g.
consumerism
waste management
chemistry
toxicology
government policy
and ethics)
consumer education
我想要的是:
recycling
environmental science
interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics)
consumer education
任何Perl正则表达式(perts)可以伸出援手吗?
我尝试修改我在类似的SO post中找到的正则表达式字符串,该字符串没有返回任何结果:
#!/usr/bin/perl
use strict;
use warnings;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts = $s =~ m{\A(\w+) ([0-9]) (\([^\(]+\)) (\w+) ([0-9]) ([0-9]{2})};
use Data::Dumper;
print Dumper \@parts;
答案 0 :(得分:9)
试试这个:
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts = split /(?![^(]+\)), /, $s;
答案 1 :(得分:3)
您选择的解决方案是优越的,但对于那些否则会说,正则表达式有一个递归元素,它将匹配嵌套的括号。以下工作正常
use strict;
use warnings;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts;
push @parts, $1 while $s =~ /
((?:
[^(),]+ |
( \(
(?: [^()]+ | (?2) )*
\) )
)*)
(?: ,\s* | $)
/xg;
print "$_\n" for @parts;
即使括号进一步嵌套。不,它不漂亮,但确实有效!
答案 2 :(得分:0)
有人说你必须一步到位吗? 您可以在循环中切片值。举个例子,你可以使用这样的东西。
use strict;
use warnings;
use 5.010;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts;
while(1){
my ($elem, $rest) = $s =~ m/^((?:\w|\s)+)(?:,\s*([^\(]*.*))?$/;
if (not $elem) {
say "second approach";
($elem, $rest) = $s =~ m/^(?:((?:\w|\s)+\s*\([^\)]+\)),\s*(.*))$/;
}
$s = $rest;
push @parts, $elem;
last if not $s;
}
use Data::Dumper;
print Dumper \@parts;
答案 3 :(得分:0)
另一种使用循环和split
的方法。我没有测试性能,但是这不应该比前瞻性regexp解决方案更快($str
的长度增加)?
my @elems = split ",", $str;
my @answer;
my @parens;
while(scalar @elems) {
push @answer,(shift @elems) while($elems[0] !~ /\(/);
push @parens, (shift @elems) while($elems[0] !~ /\)/);
push @answer, join ",", (@parens, shift @elems);
@parens = ();
}