根据公共子字符串匹配两个字符串

时间:2015-11-23 00:10:39

标签: perl

我有一个需要成对分组的文件列表。 (我需要在文件A'(标题)中附加一个HTML'文件B'(正文),因为我需要静态地提供它们而不包含服务器端包含。)

示例:

/path/to/headers/.../matching_folder/FileA.html
/someother/path/to/.../matching_folder/body/FileB.html

强调路径长度不均匀的椭圆,也不匹配文件夹'总是在路径中的相同位置。

似乎我需要根据公共子字符串匹配/加入匹配' matching_folder',但我难以扫描每个字符串,存储,匹配(摘录):

my @dirs = ( $headerPath, $bodyPath );

my @files = ();

find( { wanted => \&wanted, no_chdir => 1 }, @dirs );

foreach my $file (@files) {
# pseudocode: append $file[0] to $file[1] if both paths contain same 'matching_folder'
};

sub wanted {
return unless -f and /(FileA\.html$)|(FileB\.html$)/i;
push @files, $_;
};

1 个答案:

答案 0 :(得分:1)

通过名称中的所有目录步骤来散列文件。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use File::Find;

my $headerPath = 'headers';
my $bodyPath   = 'bodies';

my @dirs = ($headerPath, $bodyPath);
my @files;

sub wanted {
    return unless -f and /file.\.html$/;
    push @files, $_;
};

find({ wanted => \&wanted, no_chdir => 1 }, @dirs);

my %common;    
for my $file (@files) {
    my @steps = split m(/), $file;
    push @{ $common{$_} }, $file for @steps;
};

# All the headers and all the bodies share their prefixes,
# but that's not what we're interested in.
delete @common{qw{ bodies headers }};

for my $step (keys %common) {
    next if 1 == @{ $common{$step} };
    print "$step common for @{ $common{$step} }\n";
}

测试了以下结构:

bodies/3/something/C/something2/fileA.html
bodies/2/junk/B/fileB.html
bodies/1/A/fileC.html
headers/a/B/fileD.html
headers/c/one/A/two/fileE.html
headers/b/garbage/C/fileF.html

输出:

B common for headers/a/B/fileD.html bodies/2/junk/B/fileB.html
C common for headers/b/garbage/C/fileF.html bodies/3/something/C/something2/fileA.html
A common for headers/c/one/A/two/fileE.html bodies/1/A/fileC.html