如何在Perl中将整个字符串拆分为数组

时间:2015-10-28 20:47:32

标签: arrays string perl

我正在尝试处理整个字符串,但我的代码编写方式,其中一部分未被处理。这是我的代码的表示:

#!/usr/bin/perl
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
              VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
              CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
              LRDVVVGRHPLHLLEDAVTKPELRPCPTP";

$string =~ s/\s+//g;     # remove white space from string
# split the string into fragments of 58 characters and store in array
my @array = $string =~ /[A-Z]{58}/g;   
my $len = scalar @array;
print $len . "\n";    # this prints 3
# print the fragments
print $array[0] . "\n";
print $array[1] . "\n";
print $array[2] . "\n";
print $array[3] . "\n";

代码输出以下内容:

3
MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD
PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF
VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL
<blank space> 

请注意,字符串EDAVTKPELRPCPTP的其余部分未存储在@array中。当我创建数组时,如何存储EDAVTKPELRPCPTP?也许我可以将它存储在$array[3]

4 个答案:

答案 0 :(得分:5)

你几乎得到了它。您需要更改正则表达式以允许1到58个字符。

my @array = $string =~ /[A-Z]{1,58}/g;

此外,您的脚本使用@prot_seq而不是@array时出错。你应该始终use strict来保护自己免受这种事情的侵害。这是具有严格,警告和5.10功能的脚本(获取say)。

#!/usr/bin/perl

use strict;
use warnings;
use v5.10;

my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
              VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
              CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
              LRDVVVGRHPLHLLEDAVTKPELRPCPTP";

# Strip whitespace.
$string =~ s/\s+//g;

# Split the string into fragments of 58 characters or less
my @fragments = $string =~ /[A-Z]{1,58}/g;

say "Num fragments: ".scalar @fragments;
say join "\n", @fragments;

答案 1 :(得分:2)

您缺少的是能够捕捉少于超过58个字符的能力。而且,如果它只是结束,你只想这样做,你可以这样做:

class VC2: UIViewController 
{

    @IBAction func btnTapped(sender: AnyObject)
    {
        if let vc1 = self.storyboard?.instantiateViewControllerWithIdentifier("VC1")
        {
            self.navigationController?.pushViewController(vc1, animated: true)
        }
    }
}

我更喜欢这样写:

/[A-Z]{58}|[A-Z]{1,57}\z/

但是,由于此表达式默认为 greedy ,因此它更喜欢收集58个字符,并且只有在匹配输入用完时才默认为较少。

/\p{Upper}{58}|\p{Upper}{1,57}\z/

或者,由于Schwern提到的原因(例如避免任何外国信件)

/\p{Upper}{1,58}/

答案 2 :(得分:2)

您可能更喜欢使用unpack,就像这样

$string =~ s/\s+//g;    
my @fragments = unpack '(A58)*', $string;

或者如果您希望保持$string不变并拥有V5.14或更好的Perl,那么您可以写

my @fragments = unpack '(A58)*', $string =~ s/\s+//gr;

答案 3 :(得分:1)

如果你真的不需要正则表达式字符类,我就是这样做的:

use strict;
use warnings;
use Data::Dump;

my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
              VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
              CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
              LRDVVVGRHPLHLLEDAVTKPELRPCPTP";

$string =~ s/\s+//g;

my @chunks;

while (length($string)) {
    push(@chunks, substr($string, 0, 58, ''));
}

dd($string, \@chunks);

输出:

(
  "",
  [
    "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD",
    "PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF",
    "VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL",
    "EDAVTKPELRPCPTP",
  ],
)