需要帮助拆分此字符串(由逗号和“和”分隔的名字和姓氏对)

时间:2011-08-28 02:49:10

标签: regex perl

我正在使用perl,需要分割由逗号分隔的作者姓名字符串以及最后一个“和”。名称形成为名字和姓氏,如下所示:

$string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";
$string2 = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";
$string3 = "Jane Doe and Joe Smith";
# Next line doesn't work because there is no comma between last two names
@data = split(/,/, $string1);

我只想将全名拆分为数组的元素,就像split()所做的那样,以便@data数组包含,例如:

@data[0]: "Joe Smith"
@data[1]: "Jason Jones"
@data[2]: "Jane Doe"
@data[3]: "Jack Jones"

但是,问题是列表中的最后两个名称之间没有逗号。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:10)

您可以在正则表达式中使用简单的替换进行拆分:

my @parts = split(/\s*,\s*|\s+and\s+/, $string1);

例如:

$ perl -we 'my $string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*|\s+and\s+/, $string1)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $string2 = "Jane Doe and Joe Smith";print join("\n",split(/\s*,\s*|\s+and\s+/, $string2)),"\n"'
Jane Doe
Joe Smith

如果您还必须处理牛津逗号(即“这个,那个和另一个”),那么您可以使用

my @parts = split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $string1);

例如:

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jack Jones

感谢stackoverflowuser2010注意此案例。

您希望开头的\s*,\s*and\s+能够让替换的其他分支在逗号上分开,或者首先“{和”this order appears to be guaranteed as well

  

从左到右尝试替代方案,因此找到的整个表达式匹配的第一个替代方案是选择的方法。

答案 1 :(得分:4)

split之前,将and替换为,

$string1 =~ s{\s+and\s+}{,}g;