Question

我有一份像这样的作者名单：

AU  - Garrett-Bakelman, Francine E
AU  - Sheridan, Caroline K
AU  - Kacmarczyk, Thadeous J
AU  - Ishii, Jennifer
AU  - Betel, Doron
AU  - Alonso, Alicia
AU  - Mason, Christopher E
AU  - Figueroa, Maria E
AU  - Melnick, Ari M

我用perl脚本阅读：

#!/usr/bin/env perl

use strict; use warnings;
my @authors;
open my $fh, '<', '/home/con/Downloads/pmcid-PMC4354670.ris' or die "Can't read file: $!";
while (<$fh>) {
    if ($_ =~ m/^AU\s+-         #line starts with 'AU'
    \s+                         #whitespace
    (.*)                        #author is represented by non-newline characters, saved as $1
    /x) {
        push @authors, $1;
    }   
}
close $fh;
printf("there are %u authors\n", scalar @authors);
foreach my $author (@authors) {
    print "$author\n";#prints each element correctly
}
print "@authors\n";#but prints the concatenation incorrectly, 'Melnick, Ari Ma Er E Jine E'
print join ' and ', @authors;#prints 'and Melnick, Ari Ma Er E JE'

我无法正确连接字符串列表。我尝试了'join'函数，在我读代码时连接一个字符串，它总是一个混搭。

如何才能正确连接字符串数组？

Answer 1

您的文件/home/con/Downloads/pmcid-PMC4354670.ris应使用命令dos2unix从DOS约定转换为标准。

字符串末尾的尾随字符'\ r'是导致问题的原因。

Answer 2

继BOC的回答之后，您可以通过将公开通话中的<更改为<:crlf来解决问题而无需使用dos2unix：

open my $fh, '<:crlf', '/home/con/Downloads/pmcid-PMC4354670.ris';

Perl然后"converts pairs of CR,LF to a single "\n" newline character"。

或者您可以将/r/n添加到正则表达式的末尾：

print join ' and ', map { /\AAU  - (.*)\r\n/ } <$fh>;

Answer 3

将正则表达式更改为此。这适用于DOS和UNIX格式的文本文件。

if ($_ =~ m/^AU\s+-         #line starts with 'AU'
\s+                         #whitespace
([^\r\n]*)                  #author is represented by non-newline characters, saved as $1
/x) {

perl连接字符串错误

3 个答案: