我需要用不同窗口长度的主字符串创建子字符串

时间:2014-07-29 16:40:23

标签: perl substring

我有以下顺序

ABCDEFGHIJKLMNOPQRSTUVWXYZ

在这里我有一个子串JKLM。 我想创建不同长度的子字符串。例如,我希望JKLM

两侧的序列长度为6到12

答案应该是这样的

6 length
HIJKLM,
IJKLMN,
JKLMNO

7 length
GHIJKLM,
HIJKLMN,
IJKLMNO,
JKLMNOP

8 length .....等等

我对编程非常陌生,如果有人可以在perl中提供源代码,那就太好了。

2 个答案:

答案 0 :(得分:1)

使用正向前瞻断言来启用匹配之间的重叠。

以下显示长度为6到9的字符串的结果,但这很容易扩展:

use strict;
use warnings;

my $string = join '', 'A'..'Z';
my $search = 'JKLM';

for my $len (6..9) {
    my $dist = $len - length $search;
    while ($string =~ m/(?=.{0,$dist}\Q$search\E)(?=(.{$len}))/g) {
        print "$1\n";
    }
}

输出:

HIJKLM
IJKLMN
JKLMNO
GHIJKLM
HIJKLMN
IJKLMNO
JKLMNOP
FGHIJKLM
GHIJKLMN
HIJKLMNO
IJKLMNOP
JKLMNOPQ
EFGHIJKLM
FGHIJKLMN
GHIJKLMNO
HIJKLMNOP
IJKLMNOPQ
JKLMNOPQR

对于不太复杂的工具,也可以使用indexsubstr来构建此值列表。事实上,如果你是编程新手,那些是我建议你先学习的工具。

答案 1 :(得分:1)

这样的事情应该适合你

use strict;
use warnings;

my $s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';

for my $size (6, 7, 8, 13) {
  printf "%d length\n", $size;
  print "$_\n" for windows($s, 'JKLM', $size);
  print "\n";
}

sub windows {

  my ($str, $substr, $size) = @_;

  return unless $str =~ /\Q$substr\E/;

  my ($strlen, $substrlen) = map length, $str, $substr;
  return if $substrlen > $size;

  my $start = $-[0];
  my $end = $start + $substrlen;

  # Calculate the earliest offset so that the string contains
  # the whole window and the window contains the whole substring
  #
  my $wfirst = $end - $size;
  $wfirst = 0 if $wfirst < 0;

  # Calculate the latest offset so that the string contains
  # the whole window and the window contains the whole substring
  #
  my $wlast = $start + $size;
  $wlast = $strlen if $wlast > $strlen;
  $wlast -= $size;

  map { substr $str, $_, $size } $wfirst .. $wlast;
}

<强>输出

6 length
HIJKLM
IJKLMN
JKLMNO

7 length
GHIJKLM
HIJKLMN
IJKLMNO
JKLMNOP

8 length
FGHIJKLM
GHIJKLMN
HIJKLMNO
IJKLMNOP
JKLMNOPQ

13 length
ABCDEFGHIJKLM
BCDEFGHIJKLMN
CDEFGHIJKLMNO
DEFGHIJKLMNOP
EFGHIJKLMNOPQ
FGHIJKLMNOPQR
GHIJKLMNOPQRS
HIJKLMNOPQRST
IJKLMNOPQRSTU
JKLMNOPQRSTUV